Authors claim that Zuckerberg approved the use of ‘pirated’ books by Meta to train its AI models

Publié le 19 February 2025 à 18h11
modifié le 19 February 2025 à 18h11

The controversy is escalating around Meta, revealing major ethical issues in the field of artificial intelligence. Prominent authors claim that Mark Zuckerberg approved the use of “pirated” books to train the company’s AI models. Based on internal communications, Meta’s legal liability is being highlighted, questioning the often opaque practices of the tech giant. The matter underscores crucial moral and legal implications that publishing and artistic creation professionals cannot ignore.

Accusations against Meta

Authors, including the famous writer Ta-Nehisi Coates and comedian Sarah Silverman, have recently accused Meta of using copyrighted books to train its artificial intelligence models, particularly the Llama model. These accusations were revealed in a federal court document in California, where the plaintiffs presented elements they describe as evidence of Mark Zuckerberg’s direct approval.

Use of the LibGen database

According to leaked internal communications, Zuckerberg reportedly favored the use of the LibGen database, known for hosting pirated versions of books. Despite warnings from his AI executive team, which argued that this directory was undoubtedly pirated, the CEO of Meta gave his approval. An internal memo mentions that using such a dataset could harm the company’s negotiating position with regulators.

Potential legal implications

The implications of these allegations are significant in the current context where the use of protected content for training AI models is the subject of frequent and intense legal disputes. In 2023, authors filed a lawsuit against Meta, stating that the company exploited their works without their consent. The growing concern among professionals in the creative sector raises critical questions about the legality of these practices.

Evidence supporting the complaints

The court file mentions a memo indicating that, “after being brought to MZ’s attention” (Mark Zuckerberg’s initials), Meta’s AI team was “authorized to use LibGen.” This type of documentation, clearly articulating the CEO’s support for this initiative, strengthens the legitimacy of the authors’ claims.

Repercussions for Meta’s reputation

The media fallout surrounding this matter could represent a major challenge for Meta, particularly regarding its reputation with users and regulators. Company executives have already expressed concerns about the potential impact of such a controversy on their business strategies. A plaintiff’s attorney argued that any media coverage suggesting the use of a pirated dataset could undermine Meta’s negotiating position with regulatory bodies.

Future perspectives

Judge Vince Chhabria, who has examined this case, expressed certain skepticism about the chances of success for the new fraud allegations as well as claims related to copyright management. The authors now aim to revitalize their demands by amending their initial complaints, potentially allowing them to incorporate new facts and arguments into the judicial process. This evolution could pave the way for a more in-depth examination of Meta’s internal operations regarding the use of protected content.

Conclusion to be continued

The developments in this case are not limited to legal implications, but also raise ethical questions related to the use of works protected by copyright. Upcoming hearings and court decisions will provide further insights into the delicate issue of access to protected materials in the field of artificial intelligence.

FAQ on Meta’s use of pirated books

What accusations are being made against Meta regarding the use of copyrighted books?
Authors, including Ta-Nehisi Coates and Sarah Silverman, accuse Meta of using “pirated” books to train its artificial intelligence models, particularly the Llama model.
Did Mark Zuckerberg personally approve the use of these books?
According to court documents, Mark Zuckerberg reportedly approved the use of the LibGen dataset, which contains pirated books, despite warnings about legal risks.
What is the LibGen database and why is it controversial?
LibGen, or Library Genesis, is a “dark library” that offers millions of books and articles illegally, raising ethical and legal questions about the use of its content for AI training.
What legal implications arise from Meta’s use of pirated books?
The use of pirated books to train AI models could lead to lawsuits for copyright infringement and affect Meta’s negotiating position with regulators.
How are authors trying to assert their rights against Meta?
Authors have filed a lawsuit against Meta for copyright infringement, arguing that the company used their works without permission to train its AI models.
What is the impact of these accusations on the publishing industry and authors?
These accusations highlight the legal dangers facing content creators, who fear that unauthorized use of their work could affect their revenue and business model.
What is Meta’s response to these copyright infringement accusations?
Meta has not yet made any substantial statement regarding the accusations, but legal debates continue over the use of protected content for AI training.
Are the evidences presented by the plaintiffs considered solid by the court?
The court has allowed the plaintiffs to amend their claims, indicating that they have provided sufficient compelling evidence to justify reviving certain accusations, including computer fraud.
What are the potential consequences if Meta is found guilty of these accusations?
If Meta is found guilty, it could face heavy financial penalties, be forced to reconsider its AI training practices, and potentially alter industry standards regarding the use of protected works.

actu.iaNon classéAuthors claim that Zuckerberg approved the use of 'pirated' books by Meta...

translated_content> Researchers are exploring the internal mechanisms of protein linguistic models

découvrez comment les chercheurs analysent les mécanismes internes des modèles linguistiques protéiques afin de mieux comprendre leur fonctionnement et leurs applications en biologie.

Fidji Simo, the Frenchwoman who is captivating Silicon Valley with her impressive influence

découvrez fidji simo, la française au parcours exceptionnel qui conquiert la silicon valley grâce à son talent et à son influence remarquable dans l'univers de la tech et de l'innovation.

Trump’s ambitions in AI could face an obstacle: the influence of Europe.

découvrez comment les projets de donald trump sur l'intelligence artificielle pourraient être entravés par le poids croissant des régulations et standards européens dans ce domaine stratégique.
découvrez pourquoi l’audition de luc julia, souvent présenté comme le 'co-créateur de siri', au sénat soulève des questions sur la véracité de son expertise et de son parcours dans le domaine de l’intelligence artificielle.

Synthetic data: an innovative strategic asset for the insurance sector

découvrez comment les données synthétiques révolutionnent le secteur de l'assurance en offrant des solutions innovantes pour améliorer l'analyse des risques, protéger la confidentialité et stimuler l'innovation.

OpenAI brings back model 4o in ChatGPT following criticism of GPT-5

openai annonce le retour du modèle gpt-4o dans chatgpt après des retours négatifs concernant gpt-5, offrant ainsi une expérience améliorée aux utilisateurs.