$83666.044617 USD

-8.76%

ethereum

$2091.944091 USD

-11.63%

tether

$0.999558 USD

-0.02%

xrp

$2.319688 USD

-12.64%

bnb

$563.625816 USD

-6.10%

solana

$136.566716 USD

-15.32%

usd-coin

$0.999829 USD

0.00%

dogecoin

$0.192157 USD

-12.05%

cardano

$0.807339 USD

-19.23%

tron

$0.232527 USD

-2.68%

pi

$1.767751 USD

7.51%

hedera

$0.225984 USD

-9.41%

unus-sed-leo

$9.939243 USD

-0.10%

chainlink

$13.904662 USD

-14.14%

stellar

$0.283124 USD

-14.81%

暗号通貨のニュース記事

CPU が小規模な生成 AI モデルの候補として浮上

2024/05/01 19:24

CPU ベースの生成 AI: Intel と Ampere は、自社のチップがより小型のモデルを処理できると主張しています。最適化とハードウェアの進歩により、CPU のみの AI に伴うパフォーマンスの低下が軽減されます。 Intel の Granite Rapids Xeon 6 と Ampere の Altra CPU は、小規模な LLM で有望な結果を示しています。 CPU は、メモリとコンピューティングのボトルネックのため、大規模なモデルの GPU に代わることはできませんが、小規模なモデルを処理するエンタープライズアプリケーションの可能性を示しています。

CPUs Emerge as Viable Option for Running Small Generative AI Models

CPU は小規模な生成 AI モデルを実行するための実行可能なオプションとして登場

Amidst the proliferation of generative AI chatbots like ChatGPT and Gemini, discussions have centered on their dependence on high-performance computing resources such as GPUs and dedicated accelerators. However, recent advancements in CPU technology are challenging this paradigm, suggesting that CPUs can effectively handle smaller generative AI models.

ChatGPT や Gemini などの生成型 AI チャットボットが急増する中、GPU や専用アクセラレータなどの高性能コンピューティングリソースへの依存関係が議論の中心になっています。しかし、CPU テクノロジーの最近の進歩はこのパラダイムに挑戦しており、CPU がより小規模な生成 AI モデルを効果的に処理できることを示唆しています。

Performance Enhancements through Software Optimizations and Hardware Improvements

ソフトウェアの最適化とハードウェアの改善によるパフォーマンスの向上

Traditionally, running large language models (LLMs) on CPU cores has been hampered by slower performance. However, ongoing software optimizations and hardware enhancements are bridging this performance gap.

従来、CPU コア上で大規模言語モデル (LLM) を実行すると、パフォーマンスが低下することが妨げられてきました。ただし、継続的なソフトウェアの最適化とハードウェアの機能強化により、このパフォーマンスのギャップが埋められています。

Intel has showcased promising results with its upcoming Granite Rapids Xeon 6 processor, demonstrating the ability to run Meta's Llama2-70B model at 82 milliseconds (ms) of second token latency, a significant improvement over its previous Xeon processors. Oracle has also reported impressive performance running the Llama2-7B model on Ampere's Altra CPUs, achieving throughput ranging from 33 to 119 tokens per second.

Intel は、次期 Granite Rapids Xeon 6 プロセッサで有望な結果を示し、Meta の Llama2-70B モデルを 82 ミリ秒 (ms) の秒トークンレイテンシで実行できることを実証しました。これは、以前の Xeon プロセッサに比べて大幅な改善です。オラクルはまた、Ampere の Altra CPU 上で Llama2-7B モデルを実行すると、1 秒あたり 33 ～ 119 トークンの範囲のスループットを達成するという素晴らしいパフォーマンスを報告しました。

Customizations and Collaborations Enhance Performance

カスタマイズとコラボレーションでパフォーマンスを向上

These performance gains are attributed to custom software libraries and optimizations made in collaboration with Oracle. Intel and Oracle have subsequently shared performance data for Meta's newly launched Llama3 models, which exhibit similar characteristics.

これらのパフォーマンスの向上は、Oracle との協力によるカスタムソフトウェアライブラリと最適化によるものです。その後、Intel と Oracle は、同様の特性を示す Meta が新しく発売した Llama3 モデルのパフォーマンスデータを共有しました。

Suitability for Small Models and Potential for Modestly Sized Models

小型モデルへの適合性と適度なサイズのモデルへの可能性

Based on the available performance data, CPUs have emerged as a viable option for running small generative AI models. It is anticipated that CPUs may soon be capable of handling modestly sized models, especially at lower batch sizes.

入手可能なパフォーマンスデータに基づいて、CPU は小規模な生成 AI モデルを実行するための実行可能なオプションとして浮上しています。 CPU は間もなく、特に低いバッチサイズで、適度なサイズのモデルを処理できるようになることが予想されます。

Persistent Bottlenecks Limit Replaceability of GPUs and Accelerators for Larger Models

永続的なボトルネックにより、大規模モデルの GPU とアクセラレータの交換可能性が制限される

While CPUs demonstrate improved performance for generative AI workloads, it is important to note that various compute and memory bottlenecks prevent them from fully replacing GPUs or dedicated accelerators for larger models. For state-of-the-art generative AI models, specialized products like Intel's Gaudi accelerator are still necessary.

CPU は生成 AI ワークロードのパフォーマンスの向上を示していますが、さまざまなコンピューティングとメモリのボトルネックにより、大規模モデルの GPU や専用アクセラレータを完全に置き換えることはできないことに注意することが重要です。最先端の生成 AI モデルには、Intel の Gaudi アクセラレータのような特殊な製品が依然として必要です。

Overcoming Memory Limitations through Innovative Technologies

革新的なテクノロジーによるメモリ制限の克服

Unlike GPUs, CPUs rely on less expensive and more capacious DRAM modules for memory, which presents a significant advantage for running large models. However, CPUs are constrained by limited memory bandwidth compared to GPUs with HBM modules.

GPU とは異なり、CPU はメモリとして安価で大容量の DRAM モジュールに依存しているため、大規模なモデルを実行する場合に大きな利点があります。ただし、CPU は、HBM モジュールを備えた GPU と比較してメモリ帯域幅が制限されています。

Intel's Granite Rapids Xeon 6 platform addresses this limitation with the introduction of Multiplexer Combined Rank (MCR) DIMMs, which facilitate much faster memory access. This technology, combined with Intel's enhanced AMX engine, doubles the effective performance and reduces model footprint and memory requirements.

Intel の Granite Rapids Xeon 6 プラットフォームは、メモリアクセスを大幅に高速化するマルチプレクサーコンバインドランク (MCR) DIMM の導入によりこの制限に対処しています。このテクノロジーを Intel の強化された AMX エンジンと組み合わせると、実効パフォーマンスが 2 倍になり、モデルのフットプリントとメモリ要件が軽減されます。

Balanced Approach to AI Capability Optimization

AI 能力の最適化に対するバランスの取れたアプローチ

CPU designers face the challenge of optimizing their products for a wide range of AI models. Instead of prioritizing the ability to run the most demanding LLMs, vendors focus on identifying the distribution of models and targeting enterprise-grade workloads.

CPU 設計者は、幅広い AI モデルに合わせて製品を最適化するという課題に直面しています。ベンダーは、最も要求の厳しい LLM を実行する機能を優先するのではなく、モデルの分散を特定し、エンタープライズグレードのワークロードをターゲットにすることに重点を置いています。

Data from both Intel and Ampere suggests that the sweet spot for AI models in the current market lies within the 7-13 billion parameter range. These models are expected to remain mainstream, while frontier models may continue to grow in size at a slower pace.

Intel と Ampere の両方からのデータは、現在の市場における AI モデルのスイートスポットが 70 ～ 130 億のパラメータ範囲内にあることを示唆しています。これらのモデルは主流であり続けると予想されますが、フロンティアモデルはより遅いペースで規模が拡大し続ける可能性があります。

Competitive Performance Against GPUs at Low Batch Sizes

低いバッチサイズで GPU に匹敵するパフォーマンスを実現

Ampere's testing revealed competitive performance between its CPUs and Arm CPUs from AWS and Nvidia's A10 GPU for small batch sizes. However, GPUs gain an advantage at higher batch sizes due to their massive compute capacity.

Ampere のテストでは、小規模バッチサイズでは、同社の CPU と AWS の Arm CPU および Nvidia の A10 GPU の間で競合するパフォーマンスが明らかになりました。ただし、GPU はその膨大な計算能力により、バッチサイズが大きくなると有利になります。

Nonetheless, Ampere argues that the scalability of CPUs makes them more suitable for enterprise environments where the need for large-scale parallel processing is less common.

それにもかかわらず、CPU のスケーラビリティにより、大規模な並列処理の必要性がそれほど一般的ではないエンタープライズ環境により CPU が適していると Ampere 氏は主張します。

Conclusion

結論

As generative AI technology evolves, CPUs are emerging as a viable option for running small and potentially modestly sized models, thanks to ongoing performance enhancements and innovative memory solutions. While GPUs and dedicated accelerators remain essential for larger models, CPUs are poised to play a significant role in the practical deployment of AI solutions for enterprise applications.

生成 AI テクノロジが進化するにつれて、継続的なパフォーマンスの強化と革新的なメモリソリューションのおかげで、CPU は小規模で潜在的に適度なサイズのモデルを実行するための実行可能なオプションとして浮上しています。 GPU と専用アクセラレータは依然として大規模モデルに不可欠ですが、CPU はエンタープライズアプリケーション向けの AI ソリューションの実際の展開において重要な役割を果たす態勢が整っています。

免責事項:info@kdj.com

提供される情報は取引に関するアドバイスではありません。 kdj.com は、この記事で提供される情報に基づいて行われた投資に対して一切の責任を負いません。暗号通貨は変動性が高いため、十分な調査を行った上で慎重に投資することを強くお勧めします。

このウェブサイトで使用されているコンテンツが著作権を侵害していると思われる場合は、直ちに当社 (info@kdj.com) までご連絡ください。速やかに削除させていただきます。

2025年03月05日に掲載されたその他の記事

もっと