An AI system reaches human-level performance on a general intelligence test: explanations and implications

Publié le 20 February 2025 à 14h26
modifié le 20 February 2025 à 14h26

A new model of artificial intelligence

The o3 model, developed by OpenAI, has recently crossed a significant milestone. It achieved a score of 85% on the benchmark evaluation ARC-AGI, a test designed to measure general intelligence. This result is not only higher than the previous best score of 55% set by AI systems but also comparable to average human performance.

Understanding the ARC-AGI test

The ARC-AGI tests an AI system’s ability to adapt to new situations with a limited number of examples, probing what is called its sampling efficiency. A system that needs to solve a problem with few clues demonstrates an ability to effectively analyze patterns. The classical approach, used by many AI models, often relies on massive datasets, which is not always feasible.

Generalization ability

The ability to solve new problems from few examples is fundamental to true intelligence. Generalization is a central aspect of human intelligence. Current AI systems, like ChatGPT, rely on processing millions of examples to establish probabilities but suffer from a lack of sampling efficiency for infrequent tasks. The tendency to learn primarily through massive experience limits their effectiveness in more diverse contexts.

Tests on patterns and grids

The evaluation tasks of the ARC-AGI involve simple grid problems, in which AI must determine how to change an initial configuration into a target configuration. Each question provides three examples to guide learning, and it is up to the AI to deduce the rule that applies to these transformations. These challenges are reminiscent of IQ tests, often used to measure human intelligence.

Adaptation and rule identification

Results from o3 suggest that it is remarkably adaptable. Although OpenAI has not yet specified all the methods behind this success, signs indicate that the model succeeds in finding generalizable rules from a limited number of examples. Identifying so-called weak rules, which allow for greater flexibility in adaptation, appears to be an effective strategy adopted by this model.

Thinking chains

Francois Chollet, designer of the ARC-AGI, talks about an approach similar to that of AlphaGo, where AI employs thinking chains to solve tasks. This involves searching for different action sequences to arrive at the best solution. Thus, the o3 model might select the best leads based on heuristics, optimizing its ability to solve complex problems.

Uncertainties and future perspectives

The question remains whether this progress is a tangible step towards Artificial General Intelligence (AGI). The efficiency of o3 may not transcend previous models. The concepts learned by the model do not necessarily indicate better generalization. The potential of o3 must be evaluated in various contexts to determine its suitability in relation to humans.

The economic implications of developing an AI that can adapt at a human level are vast. Such an advance could lead to profound changes in various professional fields. A rigorous evaluation of o3’s capabilities, including its failures and successes, is necessary before its broader deployment.

Ongoing research on AI requires a thoughtful approach, generating ethical debates concerning its regulation and use in modern society. In this context, the attention given by the media and security institutions will be crucial to framing the advancements resulting from the latest developments in artificial intelligence.

Frequently asked questions

What is general artificial intelligence (AGI)?
General artificial intelligence (AGI) refers to a system capable of performing any intellectual task that a human being can do. This includes the ability to understand, learn, adapt, and reason in various contexts.
How did OpenAI manage to achieve human-level results with the o3 model?
OpenAI designed the o3 model to be highly adaptable, allowing it to generalize from a few examples. This includes identifying “weak rules” that enable it to solve complex problems after a limited number of examples.
What tests were used to assess OpenAI’s o3 model?
The o3 model was evaluated using the ARC-AGI benchmark, a test designed to measure an AI system’s sampling efficiency by asking it to adapt to new situations with minimal examples.
How is the o3 model different from previous AI models?
Unlike other models, the o3 model was designed to spend more time “thinking” about difficult questions and has shown a better ability to make generalizations from few examples, making it more effective in adaptation.
What are the implications of AI reaching human-level performance?
Reaching human-level performance in AI could lead to a revolution in various sectors, allowing AI systems to improve autonomously and perform more complex tasks, potentially altering many aspects of society.
What challenges remain for general artificial intelligence?
Despite advancements, challenges remain, including the complete understanding of the capabilities of the o3 model, the risks of incorrect adaptation, and the need to develop robust regulations to manage these emerging technologies.
What is the current state of research on AI and generalization?
Research is booming, focusing on improving sample learning capabilities with an increasing interest in models that allow for rapid and effective adaptation to new tasks.
What role do heuristics play in the operation of the o3 model?
Heuristics help the o3 model determine the best approach to solve tasks by searching through different “thinking chains,” allowing it to choose the most appropriate solution and thus improve its performance.
Why is it important to understand the limitations of current AI systems?
Understanding the limitations of AI systems is crucial to avoid unrealistic expectations and to develop suitable strategies for integrating these technologies into practical applications while ensuring safety and ethics in their use.

actu.iaNon classéAn AI system reaches human-level performance on a general intelligence test: explanations...

Shocked passersby by an AI advertising panel that is a bit too sincere

des passants ont été surpris en découvrant un panneau publicitaire généré par l’ia, dont le message étonnamment honnête a suscité de nombreuses réactions. découvrez les détails de cette campagne originale qui n’a laissé personne indifférent.

Apple begins shipping a flagship product made in Texas

apple débute l’expédition de son produit phare fabriqué au texas, renforçant sa présence industrielle américaine. découvrez comment cette initiative soutient l’innovation locale et la production nationale.
plongez dans les coulisses du fameux vol au louvre grâce au témoignage captivant du photographe derrière le cliché viral. entre analyse à la sherlock holmes et usage de l'intelligence artificielle, découvrez les secrets de cette image qui a fait le tour du web.

An innovative company in search of employees with clear and transparent values

rejoignez une entreprise innovante qui recherche des employés partageant des valeurs claires et transparentes. participez à une équipe engagée où intégrité, authenticité et esprit d'innovation sont au cœur de chaque projet !

Microsoft Edge: the browser transformed by Copilot Mode, an AI at your service for navigation!

découvrez comment le mode copilot de microsoft edge révolutionne votre expérience de navigation grâce à l’intelligence artificielle : conseils personnalisés, assistance instantanée et navigation optimisée au quotidien !

The European Union: A cautious regulation in the face of American Big Tech giants

découvrez comment l'union européenne impose une régulation stricte et réfléchie aux grandes entreprises technologiques américaines, afin de protéger les consommateurs et d’assurer une concurrence équitable sur le marché numérique.