Ant Group is revolutionizing the AI sector by integrating domestic chips into its technological ecosystem. This strategy aims to reduce the costs of training models while mitigating reliance on restricted American technology. The use of innovative methods such as Mixture of Experts marks a significant advancement for Chinese companies. Ant Group is addressing the challenges of accessibility to specific resources, thus initiating a major evolution in artificial intelligence. Preliminary results suggest a future where Chinese companies can compete with nations that have dominant technologies.
Use of Chinese Chips for AI Model Training
Ant Group is adopting a bold strategy by turning to domestic chips to train its artificial intelligence models. This initiative responds to the need to reduce costs and lessen reliance on restricted American technologies. Completed reports indicate that the company has already integrated chips from domestic suppliers, including those related to Alibaba and Huawei Technologies, into its model training process.
Performance Comparable to Nvidia
The results of Ant Group’s models, arising from the use of the Mixture of Experts (MoE) method, already rival the performance of Nvidia’s H800 chips. Although the company continues to use some Nvidia chips for its AI developments, it is increasingly exploring alternatives offered by AMD and Chinese chip manufacturers. This diversification underscores Ant’s commitment to the growing competition between Chinese and American technology companies.
Advancements in Cost Reduction
Ant Group has published a research document detailing that its models sometimes outperform creations from Meta, marking a significant advancement for the company. If the performance of the models holds true, Ant could take a new step in reducing costs associated with running AI applications while decreasing reliance on foreign hardware. Analysts and experts are questioning this ability to produce reliable results without resorting to high-end GPUs.
The Principle of MoE Models
MoE models segment tasks into smaller datasets, managed by different components. This approach has sparked keen interest among AI researchers and data scientists. Ant Group has clearly targeted the goal of lowering cost barriers associated with acquiring high-performance GPUs. The title of the research document emphasizes: “Scaling Models without premium GPUs”.
Impact on the AI Sector
The direction taken by Ant Group contrasts sharply with that of Nvidia, whose CEO, Jensen Huang, highlights the necessity of a permanent increase in computing power. According to him, companies will prioritize more powerful chips, which diverges from Ant’s aspiration to progress on the cost-reduction front. Thus, the strategies of the two tech giants appear to be diametrically opposed.
Cost of Training Models
According to information disclosed by Ant, training one trillion tokens – the basic units used by AI models – cost around 6.35 million yuan. Through their optimized method, Ant has managed to reduce this expense to about 5.1 million yuan, using chips with lower specifications.
Industrial Applications of AI Models
Ant intends to apply its models, named Ling-Plus and Ling-Lite, to industrial use cases such as healthcare and finance. The takeover of the medical platform Haodf.com demonstrates Ant’s ambition to deploy AI-based solutions in the healthcare sector. The company also offers various AI services, including a virtual assistant application and a financial advisory platform.
Open Source Models and Key Figures
Ling-Lite has 16.8 billion parameters, while Ling-Plus contains 290 billion. For comparison, the currently closed model GPT-4.5 reportedly has around 1.8 trillion parameters. Ant has decided to make its models open source, thus pushing innovation in the field of AI.
Ongoing Challenges in Model Training
Ant’s research highlights that, despite the advancements made, training models remains a technical challenge. Minor adjustments to the architecture or hardware during model training can cause unstable performance, leading to spikes in error rates.
For those interested in the evolution of AI and big data, the AI & Big Data Expo in Amsterdam, California, and London presents an interesting opportunity to interact with industry leaders.
Frequently Asked Questions
Why is Ant Group using domestic chips for its AI models?
Ant Group turns to domestic chips to reduce its AI training costs and decrease its reliance on restricted American technology, particularly in response to export restrictions on certain electronic components.
What types of domestic chips is Ant Group using for training its models?
Ant Group uses chips from domestic suppliers, including those associated with Alibaba and Huawei, to train AI models using innovative methods such as Mixture of Experts (MoE).
Has Ant Group achieved performance comparable to Nvidia chips with domestic chips?
Yes, according to sources, the performance of Ant Group’s models trained on domestic chips is said to be comparable to those developed with Nvidia’s H800 chips.
What are the advantages of using domestic chips for AI compared to foreign chips?
Advantages include a significant reduction in training costs, increased technological independence, and circumventing export restrictions that limit access to high-performance chips.
What is the main objective of the Mixture of Experts (MoE) method used by Ant Group?
MoE divides training tasks into smaller datasets managed by separate components, making the training process more efficient and less costly.
Is Ant Group planning to apply its AI models to other sectors?
Yes, Ant Group plans to apply its models, including Ling-Plus and Ling-Lite, to industrial use cases, such as healthcare and finance.
What are the implications of open source for Ant Group’s models?
By making its models open source, Ant Group enables other organizations to use and improve its work, which could accelerate innovation in the AI sector.
What challenges does Ant Group face in training its AI models with domestic chips?
Ant Group has reported challenges related to performance instability when making minor adjustments to hardware or model architecture, which can lead to spikes in error rates.
How does Ant Group’s strategy differ from Nvidia’s in AI training?
While Nvidia focuses on the development of more powerful GPUs with more cores and memory, Ant Group aims to reduce training costs by using chips with lower specifications.
What is the training cost of one trillion tokens according to Ant Group’s research?
The training cost of one trillion tokens is estimated to be around 5.1 million yuan due to the use of lower-performing chips, compared to 6.35 million yuan with conventional hardware.