Meta under fire for using hacked data in the development of its artificial intelligence

Publié le 19 February 2025 à 16h04
modifié le 19 February 2025 à 16h04

Meta finds itself in the spotlight of the judicial scene, accused of integrating protected works into the development of its artificial intelligence. The implications of this case raise questions about the legality and ethics of the practices adopted by tech giants. After shocking revelations, Meta is suspected of having exploited hacked databases, emanating from controversial sources, to train its AI models.
The accusations go beyond the mere use of public domain works; they also point to a deliberate manipulation of copyright protections. The situation highlights the need for robust regulations in the face of data extraction practices. This case could redefine the contours of copyright in the digital environment.

Meta under fire for using hacked data

The plaintiffs in the Kadrey et al. vs. Meta case have filed a motion accusing the company of knowingly using works protected by copyright in the development of its artificial intelligence models.

Among the plaintiffs is author Richard Kadrey, who submitted a “Reply in support of plaintiffs’ motion for permission to file a third amended consolidated complaint” to the Northern District Court of California.

Systematic piracy and illegal exploitation

The court documents allege that Meta engaged in systematic use of torrents and removed copyright management information (CMI) from the hacked datasets, including those from the ghost library LibGen.

Incriminating evidence reveals the involvement of Meta’s executives in these methods. Plaintiffs argue that CEO Mark Zuckerberg gave explicit approval for the use of the LibGen dataset, despite concerns raised by the company’s AI officials.

An internal memo from December 2024 acknowledged that LibGen was “a dataset we know to be hacked.” Discussions then emerged regarding the ethical and legal implications of employing such materials.

Internal communication and hesitations

Internal exchanges suggest that, following the acquisition of the LibGen dataset, Meta removed copyright management information from protected works. This practice, highlighted by the plaintiffs, becomes central to their accusations of copyright infringement.

Michael Clark, a representative of Meta, stated that the company implemented scripts to remove any indication of copyright, including keywords like “copyright” and “thanks.” This manipulation aimed to prepare the dataset for training Meta’s Llama AI models.

The ethical and legal implications

The allegations weigh heavily on Meta’s image, portraying the company as involved in a large-scale piracy scheme. Emails between Meta engineers reveal concerns about the optics of torrenting from company laptops.

One engineer expressed that “using a [Meta] laptop for torrenting doesn’t seem right,” yet the rapid downloading and distribution of hacked data took place.

The plaintiffs’ legal counsel asserted that in January 2024, Meta had “already torrented (downloaded and distributed) data from LibGen.” Furthermore, several related documents had initially been obtained by Meta but were withheld during the early phases of discovery.

Zuckerberg’s statements and expanding the complaint

During testimony on December 17, 2024, Zuckerberg reportedly admitted that such actions would raise “a lot of red flags” and acknowledged that it “seems problematic,” although he provided few direct answers regarding Meta’s AI training practices.

Initially, the case focused on intellectual property infringement due to the AI’s use of protected materials. The plaintiffs are now seeking to add two major charges: a violation of the Digital Millennium Copyright Act (DMCA) and a violation of the California Data Access and Fraud Act (CDAFA).

The potential effects on AI legislation

The plaintiffs claim that Meta deliberately removed copyright protections to conceal unauthorized uses of protected texts in its Llama models.

The allegations regarding the CDAFA concern the acquisition methods used to obtain the LibGen dataset, including acts of torrenting to obtain protected content.

In their internal communications, engineers openly expressed their concerns about the legality of seeding and torrenting, noting that this could pose legal issues.

An impact on copyright and creators

This dispute illustrates the growing need for clarifications regarding the intersection between copyright law and the development of AI. The plaintiffs argue that the removal of copyright protections deprives creators of their fair compensation.

Meta continues to deny all allegations in this case and has yet to publicly respond to statements made during Zuckerberg’s testimony.

This situation arises as global tension surrounding generative AI technologies rises. Other companies, such as OpenAI and Google, are also facing scrutiny regarding the use of protected data for training their models.

Meta must confront these allegations as it positions AI as a central axis of its future strategy, with accusations of reliance on pirated libraries threatening its leadership ambitions in the field.

The Kadrey et al. vs. Meta case could have significant repercussions on the future development of AI models, paving the way for meaningful legal precedents.

Authors claim that Zuckerberg approved the use of pirated books.

Lugubrious online transactions: phenomenon of body hijacking.

FAQ about Meta and the use of hacked data

What are the accusations against Meta regarding the use of hacked data?
Meta is accused of having exploited copyrighted works, including books, to train its artificial intelligence models like Llama, without permission from copyright holders.
How does Meta justify the use of this hacked data?
Meta has not provided a satisfactory explanation thus far. Allegations suggest that executives within the company, including Mark Zuckerberg, approved the use of this data despite ethical and legal concerns.
What are the potential impacts of this case on artificial intelligence development?
This case could create legal precedents regarding copyright and training practices for AI models, potentially influencing how tech companies acquire and use data in the future.
What laws are involved in this case?
The accusations include violations of the Digital Millennium Copyright Act (DMCA) and the Colorado Comprehensive Data Access and Fraud Act (CDAFA), which protect copyright and regulate access to data.
Can content creators seek compensation from Meta?
Yes, authors and rights holders can file lawsuits for copyright infringement and seek compensation if the accusations against Meta are validated.
How could the case affect Meta’s reputation?
The alleged dependence of Meta on hacked data could harm its reputation, especially in a context where it strives to maintain its leadership position in technological innovation.
What are the main ethical issues highlighted in this case?
Ethical issues include the legitimacy of using protected works for training AI models, as well as the respect for creators’ rights in an ever-evolving digital environment.
What is the public and authorities’ reaction to Meta’s actions?
The reaction is mainly negative, with growing criticism about the impact of such practices on creators and the fairness of AI systems developed with unauthorized data.
What legal framework is currently surrounding this case?
This case is currently ongoing in the U.S. District Court for the Northern District of California, where the plaintiffs seek to assert their rights and expand the accusations against Meta.
What are the implications for other tech companies?
The outcomes of this case could prompt other companies to review their data acquisition practices and respect copyright more to avoid similar legal issues.

actu.iaNon classéMeta under fire for using hacked data in the development of its...

Google DeepMind presents its new AI models for robot command

découvrez les derniers modèles d'intelligence artificielle de google deepmind, conçus pour révolutionner la commande de robots. explorez comment ces avancées technologiques pourraient transformer divers secteurs et améliorer l'efficacité des machines autonomes.

Google presents Gemma 3, a revolutionary AI model capable of running on a single GPU (GOOG:NASDAQ

découvrez gemma 3, le nouveau modèle d'intelligence artificielle de google, qui révolutionne le monde de la technologie en offrant des performances impressionnantes sur un seul gpu. plongez dans les détails de cette innovation qui pourrait transformer votre expérience numérique.
découvrez comment l'intelligence artificielle repousse les limites de la créativité en créant des œuvres d'art magistrales. openai, en tant que pionnier dans ce domaine, explore le potentiel infini de la technologie pour révolutionner l'art et inspirer les artistes de demain.
découvrez comment meta entame un déploiement préliminaire de son cœur d'intelligence artificielle visant à optimiser ses coûts d'infrastructure. ce processus marque un tournant avec un premier tape-out réussi, réalisé grâce à l'innovation de tsmc.
découvrez sora, la dernière innovation d'openai qui révolutionne la création vidéo grâce à l'intelligence artificielle. après le succès retentissant de chatgpt et dall-e, sora promet d'ouvrir de nouvelles horizons créatifs pour les vidéastes et les artistes.
découvrez les perspectives de laurent daudet de lighton sur l'impact de l'intelligence artificielle générative, une technologie qui annonce une révolution essentielle et durable, loin des idées reçues sur une simple bulle spéculative.