![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
Cryptocurrency News Articles
CPUs Emerge as Contenders for Smaller Generative AI Models
May 01, 2024 at 07:24 pm
CPU-based generative AI: Intel and Ampere argue their chips can handle smaller models. Optimizations and hardware advancements reduce performance penalties associated with CPU-only AI. Intel's Granite Rapids Xeon 6 and Ampere's Altra CPUs demonstrate promising results with small LLMs. CPUs may not replace GPUs for larger models due to memory and compute bottlenecks, but they show potential for enterprise applications handling smaller models.
CPUs Emerge as Viable Option for Running Small Generative AI Models
Amidst the proliferation of generative AI chatbots like ChatGPT and Gemini, discussions have centered on their dependence on high-performance computing resources such as GPUs and dedicated accelerators. However, recent advancements in CPU technology are challenging this paradigm, suggesting that CPUs can effectively handle smaller generative AI models.
Performance Enhancements through Software Optimizations and Hardware Improvements
Traditionally, running large language models (LLMs) on CPU cores has been hampered by slower performance. However, ongoing software optimizations and hardware enhancements are bridging this performance gap.
Intel has showcased promising results with its upcoming Granite Rapids Xeon 6 processor, demonstrating the ability to run Meta's Llama2-70B model at 82 milliseconds (ms) of second token latency, a significant improvement over its previous Xeon processors. Oracle has also reported impressive performance running the Llama2-7B model on Ampere's Altra CPUs, achieving throughput ranging from 33 to 119 tokens per second.
Customizations and Collaborations Enhance Performance
These performance gains are attributed to custom software libraries and optimizations made in collaboration with Oracle. Intel and Oracle have subsequently shared performance data for Meta's newly launched Llama3 models, which exhibit similar characteristics.
Suitability for Small Models and Potential for Modestly Sized Models
Based on the available performance data, CPUs have emerged as a viable option for running small generative AI models. It is anticipated that CPUs may soon be capable of handling modestly sized models, especially at lower batch sizes.
Persistent Bottlenecks Limit Replaceability of GPUs and Accelerators for Larger Models
While CPUs demonstrate improved performance for generative AI workloads, it is important to note that various compute and memory bottlenecks prevent them from fully replacing GPUs or dedicated accelerators for larger models. For state-of-the-art generative AI models, specialized products like Intel's Gaudi accelerator are still necessary.
Overcoming Memory Limitations through Innovative Technologies
Unlike GPUs, CPUs rely on less expensive and more capacious DRAM modules for memory, which presents a significant advantage for running large models. However, CPUs are constrained by limited memory bandwidth compared to GPUs with HBM modules.
Intel's Granite Rapids Xeon 6 platform addresses this limitation with the introduction of Multiplexer Combined Rank (MCR) DIMMs, which facilitate much faster memory access. This technology, combined with Intel's enhanced AMX engine, doubles the effective performance and reduces model footprint and memory requirements.
Balanced Approach to AI Capability Optimization
CPU designers face the challenge of optimizing their products for a wide range of AI models. Instead of prioritizing the ability to run the most demanding LLMs, vendors focus on identifying the distribution of models and targeting enterprise-grade workloads.
Data from both Intel and Ampere suggests that the sweet spot for AI models in the current market lies within the 7-13 billion parameter range. These models are expected to remain mainstream, while frontier models may continue to grow in size at a slower pace.
Competitive Performance Against GPUs at Low Batch Sizes
Ampere's testing revealed competitive performance between its CPUs and Arm CPUs from AWS and Nvidia's A10 GPU for small batch sizes. However, GPUs gain an advantage at higher batch sizes due to their massive compute capacity.
Nonetheless, Ampere argues that the scalability of CPUs makes them more suitable for enterprise environments where the need for large-scale parallel processing is less common.
Conclusion
As generative AI technology evolves, CPUs are emerging as a viable option for running small and potentially modestly sized models, thanks to ongoing performance enhancements and innovative memory solutions. While GPUs and dedicated accelerators remain essential for larger models, CPUs are poised to play a significant role in the practical deployment of AI solutions for enterprise applications.
Disclaimer:info@kdj.com
The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!
If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.