|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
在今天的一篇博文中,Apple 工程师分享了与 NVIDIA 合作的新细节,以利用大型语言模型实现更快的文本生成性能。
Apple engineers have shared new details on a collaboration with NVIDIA to implement faster text generation performance with large language models (LLMs).
Apple 工程师分享了与 NVIDIA 合作的新细节,以通过大型语言模型 (LLM) 实现更快的文本生成性能。
Earlier this year, Apple published and open sourced its Recurrent Drafter (ReDrafter) technique, a new method for generating text with LLMs that’s significantly faster and “achieves state of the art performance.” It combines two techniques: beam search (to explore multiple possibilities) and dynamic tree attention (to efficiently handle choices).
今年早些时候,Apple 发布并开源了其 Recurrent Drafter (ReDrafter) 技术,这是一种使用法学硕士生成文本的新方法,速度明显更快,并且“实现了最先进的性能”。它结合了两种技术:束搜索(探索多种可能性)和动态树注意力(有效处理选择)。
While its research demonstrated strong results, Apple also collaborated with NVIDIA to apply ReDrafter in production. As part of this collaboration, ReDrafter was integrated into NVIDIA TensorRT-LLM, a tool that helps run LLMs faster on NVIDIA GPUs.
尽管其研究成果斐然,Apple 还与 NVIDIA 合作,将 ReDrafter 应用到生产中。作为此次合作的一部分,ReDrafter 被集成到 NVIDIA TensorRT-LLM 中,该工具有助于在 NVIDIA GPU 上更快地运行 LLM。
Here are the results:
结果如下:
To enable the integration of ReDrafter, NVIDIA added new operators or exposed existing ones, which considerably improved TensorRT-LLM’s capability to accommodate sophisticated models and decoding methods. ML developers using NVIDIA GPUs can now easily benefit from ReDrafter’s accelerated token generation for their production LLM applications with TensorRT-LLM.
为了实现 ReDrafter 的集成,NVIDIA 添加了新的运算符或公开了现有的运算符,这大大提高了 TensorRT-LLM 适应复杂模型和解码方法的能力。使用 NVIDIA GPU 的 ML 开发人员现在可以轻松受益于 ReDrafter 的加速令牌生成,以使用 TensorRT-LLM 为其生产 LLM 应用程序。
In benchmarking a tens-of-billions parameter production model on NVIDIA GPUs, using the NVIDIA TensorRT-LLM inference acceleration framework with ReDrafter, we have seen 2.7x speed-up in generated tokens per second for greedy decoding. These benchmark results indicate this tech could significantly reduce latency users may experience, while also using fewer GPUs and consuming less power.
在 NVIDIA GPU 上对数百亿个参数生产模型进行基准测试时,使用 NVIDIA TensorRT-LLM 推理加速框架和 ReDrafter,我们发现每秒生成的贪婪解码令牌速度提高了 2.7 倍。这些基准测试结果表明,这项技术可以显着减少用户可能遇到的延迟,同时使用更少的 GPU 并消耗更少的电量。
“LLMs are increasingly being used to power production applications, and improving inference efficiency can both impact computational costs and reduce latency for users,” Apple’s machine learning researchers conclude. “With ReDrafter’s novel approach to speculative decoding integrated into the NVIDIA TensorRT-LLM framework, developers can now benefit from faster token generation on NVIDIA GPUs for their production LLM applications.”
苹果机器学习研究人员总结道:“法学硕士越来越多地被用于为生产应用程序提供动力,提高推理效率既可以影响计算成本,又可以减少用户的延迟。” “通过将 ReDrafter 新颖的推测性解码方法集成到 NVIDIA TensorRT-LLM 框架中,开发人员现在可以在 NVIDIA GPU 上为其生产 LLM 应用程序更快地生成令牌,从而受益。”
You can learn more about this work on Apple’s website and in a blog post on NVIDIA’s website.
您可以在 Apple 网站和 NVIDIA 网站上的博客文章中了解有关这项工作的更多信息。
免责声明:info@kdj.com
The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!
If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.
-
- 卡尔达诺(ADA)鲸鱼行为数据揭示了市场的优柔寡断立场
- 2024-12-19 10:45:01
- 卡尔达诺(ADA)过去几周波动剧烈,反映出更广泛市场的不可预测走势和精明资金的战略布局。
-
- 美联储主席鲍威尔削弱比特币涨势,称央行“无意”储存加密资产
- 2024-12-19 10:45:01
- 美联储主席杰罗姆·鲍威尔周三表示,美国央行无意参与政府储存大量比特币的任何行动。
-
- 美联储发出警告信号后,比特币创下三个多月来最大跌幅
- 2024-12-19 10:45:01
- 周三超过 5% 的跌幅将最大的数字资产推向 10 万美元,这是一个备受关注的水平。
-
- XRP 价格预测:随着代币飙升至 5.85 美元,需要关注的关键水平
- 2024-12-19 10:45:01
- 最近的 XRP 价格走势再次显着飙升,继 12 月飙升 9% 后再次短暂升至略高于 2.7 美元