市值: $3.4821T -2.060%
體積(24小時): $309.7074B -13.570%
  • 市值: $3.4821T -2.060%
  • 體積(24小時): $309.7074B -13.570%
  • 恐懼與貪婪指數:
  • 市值: $3.4821T -2.060%
Cryptos
主題
Cryptospedia
資訊
CryptosTopics
影片
Top News
Cryptos
主題
Cryptospedia
資訊
CryptosTopics
影片
bitcoin
bitcoin

$101955.948589 USD

-5.77%

ethereum
ethereum

$3240.290540 USD

-5.16%

xrp
xrp

$3.047708 USD

-4.22%

tether
tether

$0.998785 USD

0.05%

solana
solana

$236.757836 USD

-8.37%

bnb
bnb

$679.662946 USD

-3.34%

dogecoin
dogecoin

$0.340845 USD

-9.87%

usd-coin
usd-coin

$1.000086 USD

0.01%

cardano
cardano

$0.973881 USD

-8.36%

tron
tron

$0.238271 USD

-0.55%

chainlink
chainlink

$24.088213 USD

-7.00%

avalanche
avalanche

$35.090742 USD

-7.85%

stellar
stellar

$0.432208 USD

-6.63%

sui
sui

$4.304171 USD

-8.81%

hedera
hedera

$0.329054 USD

-7.24%

加密貨幣新聞文章

NVIDIA 推出 Llama 3.1-Nemotron-51B:準確度與效率的飛躍

2024/09/24 19:06

NVIDIA 的 Llama 3.1-Nemotron-51B 憑藉卓越的準確性和效率在 AI 領域樹立了新基準,可在單一 GPU 上實現高工作負載。

NVIDIA 推出 Llama 3.1-Nemotron-51B:準確度與效率的飛躍

NVIDIA's latest language model, Llama 3.1-Nemotron-51B, sets new standards in AI performance with exceptional accuracy and efficiency. This model marks an advance in scaling LLMs to fit on a single GPU, even under high workloads.

NVIDIA 的最新語言模型 Llama 3.1-Nemotron-51B 以卓越的準確性和效率為 AI 性能樹立了新標準。該模型標誌著在擴展 LLM 以適應單一 GPU(即使在高工作負載下)方面取得了進展。

NVIDIA has unveiled a new language model, dubbed Llama 3.1-Nemotron-51B, promising a leap in AI performance with superior accuracy and efficiency. This model is derived from Meta's Llama-3.1-70B and leverages a novel Neural Architecture Search (NAS) approach to optimize both accuracy and efficiency. Remarkably, this model can fit on a single NVIDIA H100 GPU, even under high workloads, making it more accessible and cost-effective.

NVIDIA 推出了一種名為 Llama 3.1-Nemotron-51B 的新語言模型,承諾以卓越的準確性和效率實現 AI 性能的飛躍。該模型源自 Meta 的 Llama-3.1-70B,並利用新穎的神經架構搜尋 (NAS) 方法來優化準確性和效率。值得注意的是,即使在高工作負載下,該模型也可以安裝在單一 NVIDIA H100 GPU 上,使其更易於使用且更具成本效益。

The Llama 3.1-Nemotron-51B model boasts 2.2 times faster inference speeds while maintaining a nearly identical level of accuracy compared to its predecessors. This efficiency enables 4 times larger workloads on a single GPU during inference, thanks to its reduced memory footprint and optimized architecture.

與前代產品相比,Llama 3.1-Nemotron-51B 模型的推理速度提高了 2.2 倍,同時保持了幾乎相同的精度水平。得益於記憶體佔用的減少和架構的最佳化,這種效率使得推理期間單一 GPU 上的工作負載增加了 4 倍。

One of the challenges in adopting large language models (LLMs) is their high inference cost. The Llama 3.1-Nemotron-51B model addresses this by offering a balanced tradeoff between accuracy and efficiency, making it a cost-effective solution for various applications, ranging from edge systems to cloud data centers. This capability is especially useful for deploying multiple models via Kubernetes and NIM blueprints.

採用大型語言模型 (LLM) 的挑戰之一是其推理成本較高。 Llama 3.1-Nemotron-51B 模型透過在準確性和效率之間提供平衡權衡來解決這個問題,使其成為從邊緣系統到雲端資料中心等各種應用的經濟高效的解決方案。此功能對於透過 Kubernetes 和 NIM 藍圖部署多個模型特別有用。

The Nemotron model is optimized with TensorRT-LLM engines for higher inference performance and packaged as an NVIDIA NIM inference microservice. This setup simplifies and accelerates the deployment of generative AI models across NVIDIA's accelerated infrastructure, including cloud, data centers, and workstations.

Nemotron 模型使用 TensorRT-LLM 引擎進行了最佳化,以實現更高的推理性能,並打包為 NVIDIA NIM 推理微服務。此設定簡化並加速了生成式 AI 模型在 NVIDIA 加速基礎架構(包括雲端、資料中心和工作站)中的部署。

The Llama 3.1-Nemotron-51B-Instruct model was built using efficient NAS technology and training methods, which enable the creation of non-standard transformer models optimized for specific GPUs. This approach includes a block-distillation framework to train various block variants in parallel, ensuring efficient and accurate inference.

Llama 3.1-Nemotron-51B-Instruct 模型是使用高效的 NAS 技術和訓練方法構建的,可以創建針對特定 GPU 優化的非標準 Transformer 模型。該方法包括一個塊蒸餾框架,用於並行訓練各種塊變體,確保推理高效且準確。

NVIDIA's NAS approach allows users to select their optimal balance between accuracy and efficiency. For instance, the Llama-3.1-Nemotron-40B-Instruct variant was created to prioritize speed and cost, achieving a 3.2 times speed increase compared to the parent model with a moderate decrease in accuracy.

NVIDIA 的 NAS 方法可讓使用者在準確性和效率之間選擇最佳平衡。例如,Llama-3.1-Nemotron-40B-Instruct 變體的創建是為了優先考慮速度和成本,與父模型相比,速度提高了 3.2 倍,但精度略有下降。

The Llama 3.1-Nemotron-51B-Instruct model has been benchmarked against several industry standards, showcasing its superior performance in various scenarios. It doubles the throughput of the reference model, making it cost-effective across multiple use cases.

Llama 3.1-Nemotron-51B-Instruct模型已針對多項行業標準進行了基準測試,展示了其在各種場景下的卓越性能。它使參考模型的吞吐量增加了一倍,使其在多個用例中具有成本效益。

The Llama 3.1-Nemotron-51B-Instruct model offers a new set of possibilities for users and companies to leverage highly accurate foundation models cost-effectively. Its balance between accuracy and efficiency makes it an attractive option for builders and highlights the effectiveness of the NAS approach, which NVIDIA aims to extend to other models.

Llama 3.1-Nemotron-51B-Instruct 模型為使用者和公司提供了一系列新的可能性,以經濟高效地利用高精度基礎模型。其準確性和效率之間的平衡使其成為對建構者有吸引力的選擇,並突顯了 NAS 方法的有效性,NVIDIA 旨在將其擴展到其他模型。

免責聲明:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

2025年01月21日 其他文章發表於