|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
NVIDIA 的 Llama 3.1-Nemotron-51B 憑藉卓越的準確性和效率在 AI 領域樹立了新基準,可在單一 GPU 上實現高工作負載。
NVIDIA's latest language model, Llama 3.1-Nemotron-51B, sets new standards in AI performance with exceptional accuracy and efficiency. This model marks an advance in scaling LLMs to fit on a single GPU, even under high workloads.
NVIDIA 的最新語言模型 Llama 3.1-Nemotron-51B 以卓越的準確性和效率為 AI 性能樹立了新標準。該模型標誌著在擴展 LLM 以適應單一 GPU(即使在高工作負載下)方面取得了進展。
NVIDIA has unveiled a new language model, dubbed Llama 3.1-Nemotron-51B, promising a leap in AI performance with superior accuracy and efficiency. This model is derived from Meta's Llama-3.1-70B and leverages a novel Neural Architecture Search (NAS) approach to optimize both accuracy and efficiency. Remarkably, this model can fit on a single NVIDIA H100 GPU, even under high workloads, making it more accessible and cost-effective.
NVIDIA 推出了一種名為 Llama 3.1-Nemotron-51B 的新語言模型,承諾以卓越的準確性和效率實現 AI 性能的飛躍。該模型源自 Meta 的 Llama-3.1-70B,並利用新穎的神經架構搜尋 (NAS) 方法來優化準確性和效率。值得注意的是,即使在高工作負載下,該模型也可以安裝在單一 NVIDIA H100 GPU 上,使其更易於使用且更具成本效益。
The Llama 3.1-Nemotron-51B model boasts 2.2 times faster inference speeds while maintaining a nearly identical level of accuracy compared to its predecessors. This efficiency enables 4 times larger workloads on a single GPU during inference, thanks to its reduced memory footprint and optimized architecture.
與前代產品相比,Llama 3.1-Nemotron-51B 模型的推理速度提高了 2.2 倍,同時保持了幾乎相同的精度水平。得益於記憶體佔用的減少和架構的最佳化,這種效率使得推理期間單一 GPU 上的工作負載增加了 4 倍。
One of the challenges in adopting large language models (LLMs) is their high inference cost. The Llama 3.1-Nemotron-51B model addresses this by offering a balanced tradeoff between accuracy and efficiency, making it a cost-effective solution for various applications, ranging from edge systems to cloud data centers. This capability is especially useful for deploying multiple models via Kubernetes and NIM blueprints.
採用大型語言模型 (LLM) 的挑戰之一是其推理成本較高。 Llama 3.1-Nemotron-51B 模型透過在準確性和效率之間提供平衡權衡來解決這個問題,使其成為從邊緣系統到雲端資料中心等各種應用的經濟高效的解決方案。此功能對於透過 Kubernetes 和 NIM 藍圖部署多個模型特別有用。
The Nemotron model is optimized with TensorRT-LLM engines for higher inference performance and packaged as an NVIDIA NIM inference microservice. This setup simplifies and accelerates the deployment of generative AI models across NVIDIA's accelerated infrastructure, including cloud, data centers, and workstations.
Nemotron 模型使用 TensorRT-LLM 引擎進行了最佳化,以實現更高的推理性能,並打包為 NVIDIA NIM 推理微服務。此設定簡化並加速了生成式 AI 模型在 NVIDIA 加速基礎架構(包括雲端、資料中心和工作站)中的部署。
The Llama 3.1-Nemotron-51B-Instruct model was built using efficient NAS technology and training methods, which enable the creation of non-standard transformer models optimized for specific GPUs. This approach includes a block-distillation framework to train various block variants in parallel, ensuring efficient and accurate inference.
Llama 3.1-Nemotron-51B-Instruct 模型是使用高效的 NAS 技術和訓練方法構建的,可以創建針對特定 GPU 優化的非標準 Transformer 模型。該方法包括一個塊蒸餾框架,用於並行訓練各種塊變體,確保推理高效且準確。
NVIDIA's NAS approach allows users to select their optimal balance between accuracy and efficiency. For instance, the Llama-3.1-Nemotron-40B-Instruct variant was created to prioritize speed and cost, achieving a 3.2 times speed increase compared to the parent model with a moderate decrease in accuracy.
NVIDIA 的 NAS 方法可讓使用者在準確性和效率之間選擇最佳平衡。例如,Llama-3.1-Nemotron-40B-Instruct 變體的創建是為了優先考慮速度和成本,與父模型相比,速度提高了 3.2 倍,但精度略有下降。
The Llama 3.1-Nemotron-51B-Instruct model has been benchmarked against several industry standards, showcasing its superior performance in various scenarios. It doubles the throughput of the reference model, making it cost-effective across multiple use cases.
Llama 3.1-Nemotron-51B-Instruct模型已針對多項行業標準進行了基準測試,展示了其在各種場景下的卓越性能。它使參考模型的吞吐量增加了一倍,使其在多個用例中具有成本效益。
The Llama 3.1-Nemotron-51B-Instruct model offers a new set of possibilities for users and companies to leverage highly accurate foundation models cost-effectively. Its balance between accuracy and efficiency makes it an attractive option for builders and highlights the effectiveness of the NAS approach, which NVIDIA aims to extend to other models.
Llama 3.1-Nemotron-51B-Instruct 模型為使用者和公司提供了一系列新的可能性,以經濟高效地利用高精度基礎模型。其準確性和效率之間的平衡使其成為對建構者有吸引力的選擇,並突顯了 NAS 方法的有效性,NVIDIA 旨在將其擴展到其他模型。
免責聲明:info@kdj.com
所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大,建議您充分研究後謹慎投資!
如果您認為本網站使用的內容侵犯了您的版權,請立即聯絡我們(info@kdj.com),我們將及時刪除。
-
- 卡爾達諾 (ADA) 反彈,模仿 2020 年以來 2,680% 的牛市
- 2024-11-22 22:25:49
- 卡爾達諾(ADA)反彈,在不到一個月的時間內收復了兩年半的損失,過去三週飆升了 174%。
-
- DOGE 再次引人注目,分析師預測可能飆升至 4 美元
- 2024-11-22 22:25:02
- $DOGE 再次引人注目,分析師預測其價格可能飆升至 4 美元,這一里程碑可能會鞏固其作為加密貨幣巨頭的地位。