$83666.044617 USD

-8.76%

ethereum

$2091.944091 USD

-11.63%

tether

$0.999558 USD

-0.02%

xrp

$2.319688 USD

-12.64%

bnb

$563.625816 USD

-6.10%

solana

$136.566716 USD

-15.32%

usd-coin

$0.999829 USD

0.00%

dogecoin

$0.192157 USD

-12.05%

cardano

$0.807339 USD

-19.23%

tron

$0.232527 USD

-2.68%

pi

$1.767751 USD

7.51%

hedera

$0.225984 USD

-9.41%

unus-sed-leo

$9.939243 USD

-0.10%

chainlink

$13.904662 USD

-14.14%

stellar

$0.283124 USD

-14.81%

加密货币新闻

CPU 成为小型生成式 AI 模型的竞争者

2024/05/01 19:24

基于 CPU 的生成式人工智能：英特尔和 Ampere 认为他们的芯片可以处理更小的模型。优化和硬件进步减少了与纯 CPU 人工智能相关的性能损失。 Intel 的 Granite Rapids Xeon 6 和 Ampere 的 Altra CPU 在小型 LLM 中展示了可喜的结果。由于内存和计算瓶颈，CPU 可能无法取代 GPU 用于较大的模型，但它们在处理较小模型的企业应用程序中显示出潜力。

CPUs Emerge as Viable Option for Running Small Generative AI Models

CPU 成为运行小型生成式 AI 模型的可行选择

Amidst the proliferation of generative AI chatbots like ChatGPT and Gemini, discussions have centered on their dependence on high-performance computing resources such as GPUs and dedicated accelerators. However, recent advancements in CPU technology are challenging this paradigm, suggesting that CPUs can effectively handle smaller generative AI models.

随着 ChatGPT 和 Gemini 等生成式 AI 聊天机器人的激增，讨论的焦点集中在它们对 GPU 和专用加速器等高性能计算资源的依赖上。然而，CPU 技术的最新进展正在挑战这种范式，这表明 CPU 可以有效地处理较小的生成式 AI 模型。

Performance Enhancements through Software Optimizations and Hardware Improvements

通过软件优化和硬件改进提高性能

Traditionally, running large language models (LLMs) on CPU cores has been hampered by slower performance. However, ongoing software optimizations and hardware enhancements are bridging this performance gap.

传统上，在 CPU 内核上运行大型语言模型 (LLM) 一直受到性能下降的阻碍。然而，持续的软件优化和硬件增强正在缩小这一性能差距。

Intel has showcased promising results with its upcoming Granite Rapids Xeon 6 processor, demonstrating the ability to run Meta's Llama2-70B model at 82 milliseconds (ms) of second token latency, a significant improvement over its previous Xeon processors. Oracle has also reported impressive performance running the Llama2-7B model on Ampere's Altra CPUs, achieving throughput ranging from 33 to 119 tokens per second.

英特尔展示了其即将推出的 Granite Rapids Xeon 6 处理器的可喜成果，展示了以 82 毫秒 (ms) 秒令牌延迟运行 Meta 的 Llama2-70B 模型的能力，比之前的 Xeon 处理器有了显着改进。 Oracle 还报告了在 Ampere 的 Altra CPU 上运行 Llama2-7B 模型的令人印象深刻的性能，实现了每秒 33 到 119 个令牌的吞吐量。

Customizations and Collaborations Enhance Performance

定制和协作提高性能

These performance gains are attributed to custom software libraries and optimizations made in collaboration with Oracle. Intel and Oracle have subsequently shared performance data for Meta's newly launched Llama3 models, which exhibit similar characteristics.

这些性能提升归功于与 Oracle 合作进行的定制软件库和优化。英特尔和甲骨文随后共享了 Meta 新推出的 Llama3 模型的性能数据，该模型表现出类似的特征。

Suitability for Small Models and Potential for Modestly Sized Models

小型模型的适用性和中等尺寸模型的潜力

Based on the available performance data, CPUs have emerged as a viable option for running small generative AI models. It is anticipated that CPUs may soon be capable of handling modestly sized models, especially at lower batch sizes.

根据可用的性能数据，CPU 已成为运行小型生成式 AI 模型的可行选择。预计 CPU 很快就能处理中等大小的模型，特别是在较小的批量大小下。

Persistent Bottlenecks Limit Replaceability of GPUs and Accelerators for Larger Models

持续存在的瓶颈限制了大型模型 GPU 和加速器的可替换性

While CPUs demonstrate improved performance for generative AI workloads, it is important to note that various compute and memory bottlenecks prevent them from fully replacing GPUs or dedicated accelerators for larger models. For state-of-the-art generative AI models, specialized products like Intel's Gaudi accelerator are still necessary.

虽然 CPU 在生成型 AI 工作负载方面表现出更高的性能，但值得注意的是，各种计算和内存瓶颈阻止它们完全取代 GPU 或大型模型的专用加速器。对于最先进的生成人工智能模型，像英特尔的 Gaudi 加速器这样的专业产品仍然是必要的。

Overcoming Memory Limitations through Innovative Technologies

通过创新技术克服内存限制

Unlike GPUs, CPUs rely on less expensive and more capacious DRAM modules for memory, which presents a significant advantage for running large models. However, CPUs are constrained by limited memory bandwidth compared to GPUs with HBM modules.

与 GPU 不同，CPU 依靠更便宜、容量更大的 DRAM 模块作为内存，这为运行大型模型提供了显着的优势。然而，与具有 HBM 模块的 GPU 相比，CPU 受到内存带宽的限制。

Intel's Granite Rapids Xeon 6 platform addresses this limitation with the introduction of Multiplexer Combined Rank (MCR) DIMMs, which facilitate much faster memory access. This technology, combined with Intel's enhanced AMX engine, doubles the effective performance and reduces model footprint and memory requirements.

英特尔的 Granite Rapids Xeon 6 平台通过引入多路复用器组合列 (MCR) DIMM 解决了这一限制，从而加快了内存访问速度。该技术与英特尔增强型 AMX 引擎相结合，可将有效性能提高一倍，并减少模型占用空间和内存需求。

Balanced Approach to AI Capability Optimization

AI 能力优化的平衡方法

CPU designers face the challenge of optimizing their products for a wide range of AI models. Instead of prioritizing the ability to run the most demanding LLMs, vendors focus on identifying the distribution of models and targeting enterprise-grade workloads.

CPU 设计人员面临着针对各种 AI 模型优化其产品的挑战。供应商并没有优先考虑运行要求最严格的法学硕士的能力，而是专注于确定模型的分布并针对企业级工作负载。

Data from both Intel and Ampere suggests that the sweet spot for AI models in the current market lies within the 7-13 billion parameter range. These models are expected to remain mainstream, while frontier models may continue to grow in size at a slower pace.

来自英特尔和 Ampere 的数据表明，当前市场中人工智能模型的最佳点位于 7-130 亿个参数范围内。这些模型预计将保持主流，而前沿模型的规模可能会继续以较慢的速度增长。

Competitive Performance Against GPUs at Low Batch Sizes

在小批量大小下与 GPU 相比具有竞争力的性能

Ampere's testing revealed competitive performance between its CPUs and Arm CPUs from AWS and Nvidia's A10 GPU for small batch sizes. However, GPUs gain an advantage at higher batch sizes due to their massive compute capacity.

Ampere 的测试表明，其 CPU 与 AWS 的 Arm CPU 以及 Nvidia 的 A10 GPU 在小批量处理方面具有竞争性的性能。然而，GPU 由于其巨大的计算能力，在更高的批量大小方面具有优势。

Nonetheless, Ampere argues that the scalability of CPUs makes them more suitable for enterprise environments where the need for large-scale parallel processing is less common.

尽管如此，Ampere 认为 CPU 的可扩展性使它们更适合大规模并行处理需求不太常见的企业环境。

Conclusion

结论

As generative AI technology evolves, CPUs are emerging as a viable option for running small and potentially modestly sized models, thanks to ongoing performance enhancements and innovative memory solutions. While GPUs and dedicated accelerators remain essential for larger models, CPUs are poised to play a significant role in the practical deployment of AI solutions for enterprise applications.

随着生成式 AI 技术的发展，得益于持续的性能增强和创新的内存解决方案，CPU 正在成为运行小型和中等大小模型的可行选择。虽然 GPU 和专用加速器对于大型模型仍然至关重要，但 CPU 有望在企业应用人工智能解决方案的实际部署中发挥重要作用。

免责声明:info@kdj.com

所提供的信息并非交易建议。根据本文提供的信息进行的任何投资，kdj.com不承担任何责任。加密货币具有高波动性，强烈建议您深入研究后，谨慎投资！

如您认为本网站上使用的内容侵犯了您的版权，请立即联系我们（info@kdj.com），我们将及时删除。

2025年03月05日发表的其他文章