AI’s accuracy in jeopardy? Entrepreneurs forced to evaluate Gemini’s responses outside their area of expertise

Publié le 20 February 2025 à 18h20
modifié le 20 February 2025 à 18h20

The accuracy of AI is now seen as a crucial issue, especially in light of Google’s new guidelines regarding Gemini. Entrepreneurs must now evaluate responses without sufficient mastery in certain areas, thereby compromising the quality of the information provided. The reliability of the information generated closely depends on the expertise of the evaluators, raising existential questions about the effectiveness of the system. Recent changes in Google’s policy force contractors to take risks by approving inappropriate content. The review of data relevance raises concerns about the compliance and accuracy of the responses provided by this AI.

Google’s New Policy Regarding Gemini AI

A major change in Google’s internal policy regarding its chatbot Gemini raises concerns about the reliability of the information provided. Contractors tasked with evaluating the AI’s responses will now have to handle prompts that exceed their area of expertise. This evolution results in a requirement to rate responses, regardless of the level of knowledge.

Evaluation of Responses by External Agents

Until recently, agents of GlobalLogic, a contracting company affiliated with Hitachi, had the option to disregard overly technical prompts or those beyond their understanding. In other words, a worker without medical training could choose not to evaluate a response concerning a rare disease. The new guidelines require each contractor to examine all entries, with no option to evade, except in specific cases such as incomplete responses or those containing harmful content requiring special approval.

Concerns About the Accuracy of Results

This evolution raises questions about the accuracy of the responses provided by Gemini on sensitive topics such as health or technical fields. Contractors, when faced with less familiar areas, could approve responses containing serious errors. One agent expressed their dismay on an internal channel, questioning the purpose of this policy: “I thought skipping prompts was aimed at improving accuracy.”

Potential Impact on Users

The risk of inaccuracies in the information provided by Gemini could have far-reaching consequences for users who rely on this tool for reliable answers. Approvals made by individuals lacking expertise on critical questions could mislead, particularly in contexts where an informed decision is necessary.

A Controversial Policy Within Google

This change in the response evaluation policy generates controversy within the company itself. Agents worry about their ability to provide valid evaluations when forced to navigate unfamiliar domains. The previous wording clearly stipulated that any agent without critical expertise was encouraged to skip complex tasks. The updated version strongly reverses this logic, leading to tensions and frustrations among employees.

Future Prospects for Gemini AI

The uncertainty surrounding the impact of this policy on Gemini’s accuracy highlights the challenges that technology companies must face. As AI evolves, the need for high-quality responses becomes imperative. Particular attention to the training of evaluators and the imposition of limits regarding prompts may be essential to ensure reliable results.

FAQ on AI Accuracy and Gemini Response Evaluation

What are the new Google policies regarding Gemini and the evaluation of responses by contractors?
Google has recently updated its internal guidelines for Gemini, requiring contractors to evaluate all responses, even those that require specialized expertise they do not possess. This policy aims to reduce the flexibility previously granted to evaluators.
How can the obligation to evaluate technical areas harm Gemini’s accuracy?
By forcing evaluators to judge responses in areas they do not master, there is an increased risk of approving incorrect responses, leading to a decrease in the accuracy of Gemini’s outputs on critical topics.
What consequences might this policy have on user trust in Gemini?
This approach may create doubts about Gemini’s reliability on sensitive topics, such as health or technology, which could lead users to not consider AI responses as a valid source of information.
How do contractors express their concerns regarding the new guidelines?
Many contractors have expressed their frustration in internal communications, noting that the ability to skip technical prompts was a means to ensure greater accuracy in response evaluations.
Under what conditions can a contractor still skip an evaluation?
Contractors can only skip an evaluation if the prompt or response is deemed incomplete, or if it contains harmful content requiring special approval to be evaluated.
How does this situation affect the perception of AI in critical sectors, such as health?
The pressure to judge responses in complex areas without relevant expertise could lead to faulty recommendations, creating an environment where decisions based on inaccurate information could harm individuals in sensitive situations.
What measures can be taken to ensure the quality of response evaluations by contractors?
Additional training, support from domain experts, and the establishment of specific evaluation protocols could be solutions to improve the quality of evaluations despite the new constraints.
Why is it important to have specialized evaluators for certain AI queries?
Having specialized evaluators ensures that responses are not only accurate but also relevant and contextualized, which is essential in fields where a mistake could have serious consequences.
What is the long-term impact of evaluation errors on generative AI?
Accumulated evaluation errors can lead to biases in AI models, thereby reducing their effectiveness and credibility in the long term, which could have repercussions on their adoption and use in various sectors.

actu.iaNon classéAI's accuracy in jeopardy? Entrepreneurs forced to evaluate Gemini's responses outside their...

Shocked passersby by an AI advertising panel that is a bit too sincere

des passants ont été surpris en découvrant un panneau publicitaire généré par l’ia, dont le message étonnamment honnête a suscité de nombreuses réactions. découvrez les détails de cette campagne originale qui n’a laissé personne indifférent.

Apple begins shipping a flagship product made in Texas

apple débute l’expédition de son produit phare fabriqué au texas, renforçant sa présence industrielle américaine. découvrez comment cette initiative soutient l’innovation locale et la production nationale.
plongez dans les coulisses du fameux vol au louvre grâce au témoignage captivant du photographe derrière le cliché viral. entre analyse à la sherlock holmes et usage de l'intelligence artificielle, découvrez les secrets de cette image qui a fait le tour du web.

An innovative company in search of employees with clear and transparent values

rejoignez une entreprise innovante qui recherche des employés partageant des valeurs claires et transparentes. participez à une équipe engagée où intégrité, authenticité et esprit d'innovation sont au cœur de chaque projet !

Microsoft Edge: the browser transformed by Copilot Mode, an AI at your service for navigation!

découvrez comment le mode copilot de microsoft edge révolutionne votre expérience de navigation grâce à l’intelligence artificielle : conseils personnalisés, assistance instantanée et navigation optimisée au quotidien !

The European Union: A cautious regulation in the face of American Big Tech giants

découvrez comment l'union européenne impose une régulation stricte et réfléchie aux grandes entreprises technologiques américaines, afin de protéger les consommateurs et d’assurer une concurrence équitable sur le marché numérique.