The AI models with the most frequent hallucinations in July 2025

Publié le 28 July 2025 à 09h41
modifié le 28 July 2025 à 09h41

Today’s AI models, while promising innovation and efficiency, present significant challenges. _Understanding the extent of hallucinations affects the reliability of results._ The potential for increasing errors remains alarming for businesses and users. This phenomenon, referred to by experts, demands heightened vigilance and in-depth analysis. _Performance evaluation becomes imperative to assess their safety._ Recent studies reveal that some models suffer from notable gaps, compromising response quality. The stakes intensify as AI infiltrates various sectors, making critical examination of these tools vital. _An enlightening ranking is essential to better anticipate risks._

Review of AI Models as of July 2025

According to the benchmark Phare LLM, Meta’s Llama 3.1 stands out by showing the lowest hallucination rate among AIs. This performance makes it the most reliable model. Conversely, the overall performance of other models reveals concerning results.

Performance Ranking of Models

The collected data shows that the French startup Giskard conducted an in-depth analysis of language models. Llama 3.1 ranks first with a reliability score of 85.8%. Following it, Gemini 1.5 Pro achieves a score of 79.12%, while Llama 4 Maverick takes third place with 77.63%.

The results highlight other models such as Claude 3.5 Haiku and Claude 3.5 Sonnet, which occupy fourth and sixth place, respectively, with close scores. GPT-4o is well placed in fifth rank, despite the underperformance of its mini version, ranked fifteenth.

Poor Performances

At the bottom of the ranking, the startup Mistral showed weak results with Mistral Small 3.1 and Mistral Large, respectively in 14th and 15th position. More concerning, the model Grok 2 developed by X does not exceed 61.38%, with an alarming score of 27.32% in terms of resistance to blocked functions.

Ranking Criteria in the Phare LLM Benchmark

The Phare LLM benchmark evaluates models according to four distinct criteria. First, resistance to hallucinations checks the accuracy of the information provided. The second criterion, resistance to damage, evaluates the dangerous or harmful behaviors of AIs.

Next, resistance to polarization tests the AI’s ability to avoid biases. This measure includes the capacity to handle biasedly phrased questions. Finally, resistance to jailbreak assesses the models’ ability to withstand unauthorized access attempts to prohibited features.

Implications for the Future of AIs

The placement of Llama 3.1 and other models on the podium underscores the importance of ensuring safe and reliable AI systems. Increased attention must be given to the performance of lower-performing models, such as Grok 2, to prevent the consequences of their inappropriate use.

This ranking also highlights ongoing debates regarding the development and evaluation methods of artificial intelligences. User expectations for increasingly higher performance raise essential ethical questions.

Concerns regarding AI safety are emphasized, creating space for deep reflection on the impact of these technologies in various fields. Continuous vigilance is necessary to ensure that technological advancements do not compromise the reliability and integrity of AIs.

FAQs Regarding AI Models with the Most Frequent Hallucinations as of July 2025

What are the most reliable AI models in terms of hallucinations in July 2025?
The most reliable AI models in July 2025 according to the Phare LLM benchmark include Llama 3.1, Gemini 1.5 Pro, and Llama 4 Maverick, which are distinguished by their low hallucination rates.

What is a hallucination in the context of AI models?
A hallucination in the context of AI models refers to a situation where the AI generates incorrect or inaccurate information, often creating non-existent details in its responses.

How are AI models evaluated in terms of hallucinations?
AI models are evaluated on four criteria: resistance to hallucinations, resistance to damage, resistance to polarization, and resistance to jailbreak. These criteria help estimate their overall reliability.

Why is Llama 3.1 considered the best AI model against hallucinations?
Llama 3.1 ranks first with a reliability level of 85.8%, demonstrating its ability to provide accurate information while avoiding the creation of false elements.

What is the failure rate of Grok 2 compared to other AI models?
Grok 2 is the AI model with the highest failure rate, assessed at only 61.38%, which raises concerns about its reliability due to its numerous hallucinations.

What impacts can hallucinations of AI models have on users?
Hallucinations can mislead users, provide inappropriate advice, or even harmful information, thereby affecting trust in these technologies.

How can users verify the reliability of the answers given by AI models?
Users should always cross-check the information provided by AI models with reliable sources and ensure that the answers do not contain invented or erroneous elements.

Which models are the worst in terms of hallucinations, according to the ranking?
The worst models in terms of hallucinations include Grok 2 and the mini versions of GPT-4o, which show reliability scores below 70%.

actu.iaNon classéThe AI models with the most frequent hallucinations in July 2025

Shocked passersby by an AI advertising panel that is a bit too sincere

des passants ont été surpris en découvrant un panneau publicitaire généré par l’ia, dont le message étonnamment honnête a suscité de nombreuses réactions. découvrez les détails de cette campagne originale qui n’a laissé personne indifférent.

Apple begins shipping a flagship product made in Texas

apple débute l’expédition de son produit phare fabriqué au texas, renforçant sa présence industrielle américaine. découvrez comment cette initiative soutient l’innovation locale et la production nationale.
plongez dans les coulisses du fameux vol au louvre grâce au témoignage captivant du photographe derrière le cliché viral. entre analyse à la sherlock holmes et usage de l'intelligence artificielle, découvrez les secrets de cette image qui a fait le tour du web.

An innovative company in search of employees with clear and transparent values

rejoignez une entreprise innovante qui recherche des employés partageant des valeurs claires et transparentes. participez à une équipe engagée où intégrité, authenticité et esprit d'innovation sont au cœur de chaque projet !

Microsoft Edge: the browser transformed by Copilot Mode, an AI at your service for navigation!

découvrez comment le mode copilot de microsoft edge révolutionne votre expérience de navigation grâce à l’intelligence artificielle : conseils personnalisés, assistance instantanée et navigation optimisée au quotidien !

The European Union: A cautious regulation in the face of American Big Tech giants

découvrez comment l'union européenne impose une régulation stricte et réfléchie aux grandes entreprises technologiques américaines, afin de protéger les consommateurs et d’assurer une concurrence équitable sur le marché numérique.