|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
NVIDIA 的 Llama 3.1-Nemotron-51B 凭借卓越的准确性和效率在 AI 领域树立了新基准,可在单个 GPU 上实现高工作负载。
NVIDIA's latest language model, Llama 3.1-Nemotron-51B, sets new standards in AI performance with exceptional accuracy and efficiency. This model marks an advance in scaling LLMs to fit on a single GPU, even under high workloads.
NVIDIA 的最新语言模型 Llama 3.1-Nemotron-51B 以卓越的准确性和效率为 AI 性能树立了新标准。该模型标志着在扩展 LLM 以适应单个 GPU(即使在高工作负载下)方面取得了进步。
NVIDIA has unveiled a new language model, dubbed Llama 3.1-Nemotron-51B, promising a leap in AI performance with superior accuracy and efficiency. This model is derived from Meta's Llama-3.1-70B and leverages a novel Neural Architecture Search (NAS) approach to optimize both accuracy and efficiency. Remarkably, this model can fit on a single NVIDIA H100 GPU, even under high workloads, making it more accessible and cost-effective.
NVIDIA 推出了一种名为 Llama 3.1-Nemotron-51B 的新语言模型,承诺以卓越的准确性和效率实现 AI 性能的飞跃。该模型源自 Meta 的 Llama-3.1-70B,并利用新颖的神经架构搜索 (NAS) 方法来优化准确性和效率。值得注意的是,即使在高工作负载下,该模型也可以安装在单个 NVIDIA H100 GPU 上,从而使其更易于使用且更具成本效益。
The Llama 3.1-Nemotron-51B model boasts 2.2 times faster inference speeds while maintaining a nearly identical level of accuracy compared to its predecessors. This efficiency enables 4 times larger workloads on a single GPU during inference, thanks to its reduced memory footprint and optimized architecture.
与前代产品相比,Llama 3.1-Nemotron-51B 模型的推理速度提高了 2.2 倍,同时保持了几乎相同的精度水平。得益于内存占用的减少和架构的优化,这种效率使得推理期间单个 GPU 上的工作负载增加了 4 倍。
One of the challenges in adopting large language models (LLMs) is their high inference cost. The Llama 3.1-Nemotron-51B model addresses this by offering a balanced tradeoff between accuracy and efficiency, making it a cost-effective solution for various applications, ranging from edge systems to cloud data centers. This capability is especially useful for deploying multiple models via Kubernetes and NIM blueprints.
采用大型语言模型 (LLM) 的挑战之一是其推理成本较高。 Llama 3.1-Nemotron-51B 模型通过在准确性和效率之间提供平衡权衡来解决这个问题,使其成为从边缘系统到云数据中心等各种应用的经济高效的解决方案。此功能对于通过 Kubernetes 和 NIM 蓝图部署多个模型特别有用。
The Nemotron model is optimized with TensorRT-LLM engines for higher inference performance and packaged as an NVIDIA NIM inference microservice. This setup simplifies and accelerates the deployment of generative AI models across NVIDIA's accelerated infrastructure, including cloud, data centers, and workstations.
Nemotron 模型使用 TensorRT-LLM 引擎进行了优化,以实现更高的推理性能,并打包为 NVIDIA NIM 推理微服务。此设置简化并加速了生成式 AI 模型在 NVIDIA 加速基础设施(包括云、数据中心和工作站)中的部署。
The Llama 3.1-Nemotron-51B-Instruct model was built using efficient NAS technology and training methods, which enable the creation of non-standard transformer models optimized for specific GPUs. This approach includes a block-distillation framework to train various block variants in parallel, ensuring efficient and accurate inference.
Llama 3.1-Nemotron-51B-Instruct 模型是使用高效的 NAS 技术和训练方法构建的,可以创建针对特定 GPU 优化的非标准 Transformer 模型。该方法包括一个块蒸馏框架,用于并行训练各种块变体,确保推理高效且准确。
NVIDIA's NAS approach allows users to select their optimal balance between accuracy and efficiency. For instance, the Llama-3.1-Nemotron-40B-Instruct variant was created to prioritize speed and cost, achieving a 3.2 times speed increase compared to the parent model with a moderate decrease in accuracy.
NVIDIA 的 NAS 方法允许用户在准确性和效率之间选择最佳平衡。例如,Llama-3.1-Nemotron-40B-Instruct 变体的创建是为了优先考虑速度和成本,与父模型相比,速度提高了 3.2 倍,但精度略有下降。
The Llama 3.1-Nemotron-51B-Instruct model has been benchmarked against several industry standards, showcasing its superior performance in various scenarios. It doubles the throughput of the reference model, making it cost-effective across multiple use cases.
Llama 3.1-Nemotron-51B-Instruct模型已针对多项行业标准进行了基准测试,展示了其在各种场景下的卓越性能。它使参考模型的吞吐量增加了一倍,使其在多个用例中具有成本效益。
The Llama 3.1-Nemotron-51B-Instruct model offers a new set of possibilities for users and companies to leverage highly accurate foundation models cost-effectively. Its balance between accuracy and efficiency makes it an attractive option for builders and highlights the effectiveness of the NAS approach, which NVIDIA aims to extend to other models.
Llama 3.1-Nemotron-51B-Instruct 模型为用户和公司提供了一系列新的可能性,以经济高效地利用高精度基础模型。其准确性和效率之间的平衡使其成为对构建者有吸引力的选择,并突显了 NAS 方法的有效性,NVIDIA 旨在将其扩展到其他模型。
免责声明:info@kdj.com
所提供的信息并非交易建议。根据本文提供的信息进行的任何投资,kdj.com不承担任何责任。加密货币具有高波动性,强烈建议您深入研究后,谨慎投资!
如您认为本网站上使用的内容侵犯了您的版权,请立即联系我们(info@kdj.com),我们将及时删除。
-
- 卡尔达诺 (ADA) 反弹,模仿 2020 年以来 2,680% 的牛市
- 2024-11-22 22:25:49
- 卡尔达诺(ADA)反弹,在不到一个月的时间里收复了两年半的损失,过去三周飙升了 174%。
-
- DOGE 再次引人注目,分析师预测可能飙升至 4 美元
- 2024-11-22 22:25:02
- $DOGE 再次引人注目,分析师预测其价格可能飙升至 4 美元,这一里程碑可能会巩固其作为加密货币巨头的地位。