Grok-4: Elon Musk's AI redefines benchmarks

Grok-4 redefines the landscape of artificial intelligence, emerging as a pillar of unmatched performance. This innovation from xAI, founded by Elon Musk, transcends the established standards of traditional benchmarks. *Superior results compared to the models from OpenAI*, Anthropic, and Google DeepMind testify to a significant advancement.

The focus on reasoning gives Grok-4 superiority in complex tasks. *The parallel orchestrated capabilities of Grok-4 Heavy* offer a new perspective on problem-solving. The stakes of this technological feat reveal an unprecedented potential for innovation in the field of AI.

Grok-4: Performance Revolution in Artificial Intelligence

The Grok-4 model, developed by the start-up xAI founded by Elon Musk, has recently surpassed the former leader, o3-pro from OpenAI, during benchmarks. This major advancement is the result of an intensification of research on complex reasoning.

Focus on Reasoning

xAI has chosen to concentrate its efforts on reasoning, as opposed to generalist models. Grok-4 specializes in tasks requiring sophisticated thinking and advanced logic. The focus has been on reinforcement learning, with investments such as the utilization of 200,000 GPUs from the Colossus supercomputer.

Remarkable Benchmark Performances

This model has set impressive records on several benchmarks. On the PhD-level test, Humanities Last Exam, Grok-4 solves nearly 26.9% of the questions in standard mode and 45% with its Heavy version. These results place it squarely on par with post-doctoral research. Nowhere else could a human hope to achieve even 5% success in this exam.

In mathematics, Grok-4 achieves a perfect score of 100% on the AIME25, surpassing the 98.4% score of o3. On HMMT25, it also distinguishes itself with 96.7% compared to Claude 4 Opus’s 82.5%.

New Records in Fluid Intelligence

Grok-4 stands out particularly on the ARC-AGI test, becoming the first public model to cross the 10% accuracy threshold with 15.9%. Greg Kamradt, president of the ARC Prize, confirmed this exceptional performance. The previous score was around 8% with Claude Opus 4.

Acknowledging Limitations

Although Grok-4 is at the forefront of reasoning, some of its abilities raise questions. Its multimodal performances remain basic. Elon Musk himself acknowledged that this model was partially blind and that its understanding of images needed improvement.

Regarding programming, Grok-4 shows mixed results. On the LiveCodeBench test, it records a score of 79.4%, aligning with Gemini 2.5 Pro and slightly below o3.

Pricing and Subscriptions

Grok-4 is available to the general public through the SuperGrok subscription at $30 per month. The SuperGrok Heavy subscription, at $300 per month, grants access to the multi-agent version. This pricing positions xAI as one of the most expensive AI providers.

For now, the Grok API is also accessible, although pricing remains to be determined.

Future Perspectives

xAI envisions an ambitious timetable for the future. A specialized coding model is scheduled for August, followed by a multimodal agent in September and a video generation model in October. The competition remains fierce, with other players like Claude and Google actively developing their own models.

Frequently Asked Questions

What are the main features of Grok-4?
Grok-4 focuses on complex reasoning, breaking down problems into steps and identifying logical relationships. It employs advanced reinforcement learning techniques and has a context of 256,000 tokens.

How does Grok-4 compare to other artificial intelligence models such as those from OpenAI and Google?
Grok-4 has surpassed the performance of models such as o3-pro from OpenAI and Gemini 2.5 Pro, setting new records in several benchmark tests and claiming superior performance over Anthropic and Google DeepMind.

What are Grok-4’s benchmark results?
Grok-4 achieved impressive scores: 26.9% success on Humanities Last Exam and 100% on AIME25, also outperforming Claude-4 and other competitors across various tests.

What are Grok-4’s current limitations?
While Grok-4 excels at reasoning, its multimodal capabilities are still limited, and it shows varied performance in programming, particularly on LiveCodeBench, where it scores 79.4%.

What is the Grok-4 Heavy model and how does it differ from the standard model?
Grok-4 Heavy mobilizes multiple agents in parallel to solve complex problems, thus allowing for a more robust and varied approach to the questions posed.

What is the cost of accessing Grok-4 for users?
The SuperGrok subscription for Grok-4 costs $30 per month, while the SuperGrok Heavy subscription, providing access to the enhanced capabilities of Grok-4 Heavy, is offered at $300 monthly.

What future innovations are planned for Grok-4?
xAI plans to launch a specialized coding model in August, a multimodal agent in September, and a video generation model in October, thus adding additional features to the platform.

Grok-4, a new achievement for Elon Musk’s artificial intelligence in benchmarks

Grok-4: Performance Revolution in Artificial Intelligence

Focus on Reasoning

Remarkable Benchmark Performances

New Records in Fluid Intelligence

Acknowledging Limitations

Pricing and Subscriptions

Future Perspectives

Frequently Asked Questions

Shocked passersby by an AI advertising panel that is a bit too sincere

Apple begins shipping a flagship product made in Texas

Flight at the Louvre: the mystery of the viral photo decoded by its photographer, between Sherlock Holmes and artificial...

An innovative company in search of employees with clear and transparent values

Microsoft Edge: the browser transformed by Copilot Mode, an AI at your service for navigation!

The European Union: A cautious regulation in the face of American Big Tech giants

Grok-4, a new achievement for Elon Musk’s artificial intelligence in benchmarks

Grok-4: Performance Revolution in Artificial Intelligence

Focus on Reasoning

Remarkable Benchmark Performances

New Records in Fluid Intelligence

Acknowledging Limitations

Pricing and Subscriptions

Future Perspectives

Frequently Asked Questions

.tdi_114{z-index:84546!important}Apple begins shipping a flagship product made in Texas

.tdi_133{z-index:84546!important}Flight at the Louvre: the mystery of the viral photo decoded by its photographer, between Sherlock Holmes and artificial...

.tdi_152{z-index:84546!important}An innovative company in search of employees with clear and transparent values

.tdi_171{z-index:84546!important}Microsoft Edge: the browser transformed by Copilot Mode, an AI at your service for navigation!

.tdi_190{z-index:84546!important}The European Union: A cautious regulation in the face of American Big Tech giants

Apple begins shipping a flagship product made in Texas

Flight at the Louvre: the mystery of the viral photo decoded by its photographer, between Sherlock Holmes and artificial...

An innovative company in search of employees with clear and transparent values

Microsoft Edge: the browser transformed by Copilot Mode, an AI at your service for navigation!

The European Union: A cautious regulation in the face of American Big Tech giants