|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
如果您通过 BGR 链接购买,我们可能会赚取附属佣金,帮助支持我们的专家产品实验室。 Apple 和 NVIDIA 分享了合作细节
Tech giants Apple and NVIDIA have joined forces to enhance the performance of Large Language Models (LLMs) by introducing a new text generation technique for AI.
科技巨头 Apple 和 NVIDIA 联手通过引入新的 AI 文本生成技术来增强大型语言模型 (LLM) 的性能。
According to Apple, accelerating LLM inference is a crucial ML research problem. This is because auto-regressive token generation is computationally expensive and relatively slow. As a result, improving inference efficiency can reduce latency for users.
Apple 表示,加速 LLM 推理是一个至关重要的 ML 研究问题。这是因为自回归令牌生成的计算成本较高且相对较慢。因此,提高推理效率可以减少用户的延迟。
In addition to ongoing efforts to accelerate inference on Apple silicon, the company has recently made significant progress in accelerating LLM inference for the NVIDIA GPUs widely used for production applications across the industry, the company writes in a research paper.
该公司在一份研究论文中写道,除了持续努力加速 Apple 芯片上的推理之外,该公司最近还在加速广泛用于整个行业生产应用的 NVIDIA GPU 的 LLM 推理方面取得了重大进展。
Earlier this year, Apple published and open-sourced Recurrent Drafter (ReDrafter), which is a novel approach to speculative decoding that “achieves state of the art performance.”
今年早些时候,Apple 发布并开源了 Recurrent Drafter (ReDrafter),这是一种新颖的推测解码方法,“实现了最先进的性能”。
According to the company, ReDrafter uses an RNN draft model, and combines beam search with dynamic tree attention to speed up LLM token generation by up to 3.5 tokens per generation step for open source models, surpassing the performance of prior speculative decoding techniques.
据该公司称,ReDrafter 使用 RNN 草案模型,并将波束搜索与动态树注意力相结合,将开源模型的 LLM 令牌生成速度提高到每生成步 3.5 个令牌,超越了之前的推测解码技术的性能。
“In benchmarking a tens-of-billions parameter production model on NVIDIA GPUs, using the NVIDIA TensorRT-LLM inference acceleration framework with ReDrafter, we have seen 2.7x speed-up in generated tokens per second for greedy decoding,” Apple papers show.
苹果论文显示:“在 NVIDIA GPU 上对数百亿个参数生产模型进行基准测试时,使用 NVIDIA TensorRT-LLM 推理加速框架和 ReDrafter,我们发现每秒生成的贪婪解码令牌速度提高了 2.7 倍。”
With that, this technology could signifanctly reduce latency users may experience, while also using fewer GPUs and consuming less power.
这样,该技术可以显着减少用户可能遇到的延迟,同时使用更少的 GPU 并消耗更少的电量。
免责声明:info@kdj.com
The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!
If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.
-
- 1800 年代稀有且有价值的硬币
- 2024-12-21 03:10:01
- 并非每枚旧硬币都具有巨大价值,但许多 1800 年代的硬币如今价值数千甚至数百万美元。这种价值很大程度上源于它们的稀有性和历史意义。
-
- Qubetics、比特币和以太坊:重塑加密货币格局
- 2024-12-21 02:55:01
- 加密货币市场继续以惊人的速度发展,其特点是突破性的创新、法律纠纷和价格波动。