OpenAI is facing a *major obstacle* regarding the development of its latest artificial intelligence model. A recent analysis highlights a *lack of sufficient data* to train this system of unprecedented complexity. The *valuation of OpenAI* at 157 billion dollars relies on the success of this technology. The challenges encountered during its development underline the limitations of the data available in the current digital ecosystem. The implications of this situation resonate beyond simple technical challenges, questioning the ability of artificial intelligence to progress in the face of these constraints.
Problems encountered by OpenAI’s latest model
A report from the Wall Street Journal revealed that OpenAI’s artificial intelligence project, known as GPT-5 or Orion, is significantly behind schedule. This model, which requires a colossal volume of data to be operational, faces a troubling reality: the lack of sufficient data in the world for its development.
Astronomical development costs
The temptation to develop a cutting-edge AI model has led to significant expenditures. The training costs for Orion, over a six-month period, could reach nearly 500 million dollars. In comparison, the training of its predecessor, GPT-4, amounted to approximately 100 million dollars. These colossal sums underscore the financial pressure that OpenAI is facing, exacerbated by the need for a functional model.
Project guidelines
Designed to bridge the gap between data creation and achieving the desired outcomes, Orion was meant to surpass all previous advancements of the company, notably by making significant scientific discoveries and performing routine human tasks. Nevertheless, large-scale training trials revealed significant limitations.
Lack of data on the Internet
OpenAI researchers found a lack of available data on public internet, often used to train previous models. This insufficiency has prompted the company to consider alternative solutions. Software engineers and mathematicians have been hired to generate new data, but this process proves to be laborious and time-consuming.
Use of synthetic data
At the same time, OpenAI is utilizing synthetic data, created by the AI itself, to fuel Orion’s training. However, this method carries risks, resulting in noticeable malfunctions and inappropriate responses that harm the model’s credibility. Such problems only emerge after intensive training phases.
Absence of significant progress
No significant advancement on the anticipated projection has been observed during the ongoing testing. The operational results of Orion do not justify the exorbitant costs incurred. The initial projection anticipated a model that could emerge as a benchmark in the use of AI, equivalent to a PhD in artificial intelligence.
Internal challenges and external competition
OpenAI must also manage issues of internal governance, including organizational instability. Many leaders, including the co-founder and chief scientist, have left the company. This instability undoubtedly affects the project’s advancement.
Moreover, rivals such as Anthropic and Google are making significant milestones. Their models, often deemed superior, threaten OpenAI’s leading position in the market. As the development of GPT-4 has become obsolete, the pressure on Orion will only increase in the future.
Frequently Asked Questions
Why is OpenAI’s latest model, GPT-5, encountering difficulties during training?
The model faces obstacles due to a lack of sufficient data available on the Internet for training, complicating its effective development.
What are the consequences of the lack of data for the training of the GPT-5 model?
The lack of data can lead to suboptimal model performance, making it difficult to function as intended upon launch.
How is OpenAI attempting to address the data problem for GPT-5?
OpenAI is trying to create data from scratch by hiring software engineers and mathematicians while using synthetic data, but this proves to be a long and complicated process.
What are the financial implications of the development delays of GPT-5?
Delays at OpenAI can lead to high costs, with expenditures potentially reaching hundreds of millions of dollars without a guarantee of delivering a finished product.
Can you explain the concept of synthetic data used by OpenAI?
Synthetic data refers to data generated by artificial intelligence to train the model, but its use has shown limitations, such as incoherent or erroneous responses.
What is the link between OpenAI’s valuation and the success of GPT-5?
The valuation of OpenAI, estimated at 157 billion dollars, heavily depends on the success of GPT-5; if the model does not perform as expected, it could negatively impact investor confidence.
What alternatives does OpenAI have for developing higher-performing AI models?
OpenAI could consider collaborating with other companies to share resources or exploring different training methods that require less data.
How much time does OpenAI anticipate for the complete development of GPT-5?
Initially, the model was expected to be available around mid-2024, but due to the difficulties encountered, this deadline may be extended.