AI language models: a treasure trove to share

The recent developments in language models suggest a radical transformation of practices in historical research. Let’s pose a fundamental question: who controls these tools that influence our understanding of the past? Private giants dominate, their interests often opposing essential academic values such as transparency and accessibility. The necessity to establish public language models is strongly emerging, inviting a rethinking of intellectual property in favor of a truly inclusive and collaborative academic culture.

The evolving landscape of language models

Powerful language models (LLMs) are fundamentally transforming historical research. This shift occurs thanks to their ability to process, annotate, and generate texts in a way that redefines traditional academic processes.

Ownership and control of technologies

The question of who owns these tools becomes central in the debate. The most powerful LLMs are often developed by private companies. Their main objective remains profit, raising questions about how these companies influence our understanding of the past.

The values of historical research

The fundamental values of historical research include transparency, accessibility, and cultural diversity. These principles do not always align with the goals of companies developing LLMs. Thus, control of intellectual property over these tools becomes problematic, threatening the integrity of academic discourse.

Issues associated with commercial LLMs

Two predominant issues arise in this problematic situation: opacity and instability. Opacity results from the lack of access to training data and the potential biases embedded in these systems. As for instability, access terms and the capabilities of LLMs can change without notice, directly affecting the researchers who use them.

Inequalities in the field of research

The question of equity also raises concerns. Many researchers, particularly those operating in less resource-rich contexts, find themselves excluded from the advancements offered by these technologies. This further exacerbates disparities within academic communities.

Toward public language models

The need to develop public and open-access LLMs for the humanities is imperative. These models should be trained on historically grounded and multilingual corpora, sourced from libraries, museums, and public archives. Such a project requires academic rigor and public funding.

Responsibility of the humanities

The humanities must seize the opportunity to create an artificial intelligence that is both culturally aware and academically rigorous. Such responsibility includes not only the ethical use of LLMs but also their collective ownership.

Infrastructure challenges

Building the necessary infrastructure for such models is a challenge. An analogy can be established with managing national archives or school programs, which should not be entrusted to private entities. This approach demands a common and accessible area of action for all.

Consequences for public knowledge

The way LLMs are developed and controlled could very well determine the future of public knowledge. The need for an open dialogue about how these technologies shape our understanding of the world is urgent. Preserving academic integrity and human values remains essential in this digital age.

Projects aimed at modifying initiatives to protect creative industries illustrate the tension between innovation and cultural protection. It becomes imperative to promote solutions that emphasize access and collaboration, thereby ensuring a diversity of voices in historical narratives.

The ethical questions surrounding companies like Meta, criticized for their use of data from dubious sources, challenge researchers’ solidarity with their discipline and its values. As artificial intelligence increasingly integrates into our lives, the urgency for a robust ethical framework grows.

Every step towards a future where language models are accessible and equitable represents progress towards a more inclusive historical dialogue, allowing everyone to share and refine the collective understanding of our past.

Frequently Asked Questions

Why is it important that language models are publicly owned?
Public ownership of language models ensures their accessibility to all researchers, promotes transparency, and allows for ethical and responsible use in the field of humanities.

What are the risks associated with the privatization of language models?
Privatization can lead to issues of opacity, access instability, and inequality in access to tools for researchers, particularly those from under-resourced backgrounds.

How can the transparency of language models be ensured?
To guarantee transparency, it is essential that training data be accessible and that potential biases are identified and corrected, enabling researchers to analyze results critically.

What type of data should be used to train public language models?
Models should be trained on historical, multilingual, and curated corpora sourced from libraries, museums, and archives to enrich cultural and academic diversity.

What are the benefits of public funding for language models?
Public funding helps maintain the independence of models, fosters collaboration among researchers, and ensures that academic values such as reproducibility and accessibility are respected.

How can researchers get involved in creating public language models?
Researchers can participate in development initiatives, contribute to defining standards and protocols, and encourage public funding while sharing their knowledge about the use of LLMs.

What consequences could the privatization of AI tools have on the future of historical research?
Privatization could create inequalities in access to interpretive tools, affecting research and limiting the production of diverse and inclusive knowledge in the historical field.

What roles should academic communities play in the development of public LLMs?
Academic communities should be active in overseeing responsible development, ensuring that research values are respected, and promoting ethics in the use of models.

Powerful AI language models should be publicly owned | Letter

The evolving landscape of language models

Ownership and control of technologies

The values of historical research

Issues associated with commercial LLMs

Inequalities in the field of research

Toward public language models

Responsibility of the humanities

Infrastructure challenges

Consequences for public knowledge

Frequently Asked Questions

Google is committed to investing 10 billion dollars in a project of data centers dedicated to artificial intelligence in...

Trump’s false supporters: Fake protesters propelled on social media

The exception of TDM within copyright law: a key asset for the development of artificial intelligence in Europe

Revealing analysis: 86% of references to artificial intelligences come from brand-controlled sources

“ChatGPT, my invaluable ally”: the ingenious tips from young professionals struggling with spelling

Actors strongly oppose the use of their images in AI-generated content: a threat to fairness

Powerful AI language models should be publicly owned | Letter

The evolving landscape of language models

Ownership and control of technologies

The values of historical research

Issues associated with commercial LLMs

Inequalities in the field of research

Toward public language models

Responsibility of the humanities

Infrastructure challenges

Consequences for public knowledge

Frequently Asked Questions

.tdi_114{z-index:84546!important}Trump’s false supporters: Fake protesters propelled on social media

.tdi_133{z-index:84546!important}The exception of TDM within copyright law: a key asset for the development of artificial intelligence in Europe

.tdi_152{z-index:84546!important}Revealing analysis: 86% of references to artificial intelligences come from brand-controlled sources

.tdi_171{z-index:84546!important}“ChatGPT, my invaluable ally”: the ingenious tips from young professionals struggling with spelling

.tdi_190{z-index:84546!important}Actors strongly oppose the use of their images in AI-generated content: a threat to fairness

Trump’s false supporters: Fake protesters propelled on social media

The exception of TDM within copyright law: a key asset for the development of artificial intelligence in Europe

Revealing analysis: 86% of references to artificial intelligences come from brand-controlled sources

“ChatGPT, my invaluable ally”: the ingenious tips from young professionals struggling with spelling

Actors strongly oppose the use of their images in AI-generated content: a threat to fairness