|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
在今天的一篇部落格文章中,Apple 工程師分享了與 NVIDIA 合作的新細節,以利用大型語言模型實現更快的文字生成效能。
Apple engineers have shared new details on a collaboration with NVIDIA to implement faster text generation performance with large language models (LLMs).
Apple 工程師分享了與 NVIDIA 合作的新細節,以透過大型語言模型 (LLM) 實現更快的文字生成效能。
Earlier this year, Apple published and open sourced its Recurrent Drafter (ReDrafter) technique, a new method for generating text with LLMs that’s significantly faster and “achieves state of the art performance.” It combines two techniques: beam search (to explore multiple possibilities) and dynamic tree attention (to efficiently handle choices).
今年早些時候,Apple 發布並開源了其 Recurrent Drafter (ReDrafter) 技術,這是一種使用法學碩士生成文本的新方法,速度明顯更快,並且「實現了最先進的性能」。它結合了兩種技術:集束搜尋(探索多種可能性)和動態樹注意力(有效處理選擇)。
While its research demonstrated strong results, Apple also collaborated with NVIDIA to apply ReDrafter in production. As part of this collaboration, ReDrafter was integrated into NVIDIA TensorRT-LLM, a tool that helps run LLMs faster on NVIDIA GPUs.
儘管其研究成果斐然,Apple 也與 NVIDIA 合作,將 ReDrafter 應用到生產中。作為合作的一部分,ReDrafter 被整合到 NVIDIA TensorRT-LLM 中,該工具有助於在 NVIDIA GPU 上更快地運行 LLM。
Here are the results:
結果如下:
To enable the integration of ReDrafter, NVIDIA added new operators or exposed existing ones, which considerably improved TensorRT-LLM’s capability to accommodate sophisticated models and decoding methods. ML developers using NVIDIA GPUs can now easily benefit from ReDrafter’s accelerated token generation for their production LLM applications with TensorRT-LLM.
為了實現 ReDrafter 的集成,NVIDIA 增加了新的運算子或公開了現有的運算符,這大大提高了 TensorRT-LLM 適應複雜模型和解碼方法的能力。使用 NVIDIA GPU 的 ML 開發人員現在可以輕鬆受益於 ReDrafter 的加速令牌生成,以使用 TensorRT-LLM 為其生產 LLM 應用程式。
In benchmarking a tens-of-billions parameter production model on NVIDIA GPUs, using the NVIDIA TensorRT-LLM inference acceleration framework with ReDrafter, we have seen 2.7x speed-up in generated tokens per second for greedy decoding. These benchmark results indicate this tech could significantly reduce latency users may experience, while also using fewer GPUs and consuming less power.
在 NVIDIA GPU 上對數百億個參數生產模型進行基準測試時,使用 NVIDIA TensorRT-LLM 推理加速框架和 ReDrafter,我們發現每秒生成的貪婪解碼令牌速度提高了 2.7 倍。這些基準測試結果表明,這項技術可以顯著減少用戶可能遇到的延遲,同時使用更少的 GPU 並消耗更少的電量。
“LLMs are increasingly being used to power production applications, and improving inference efficiency can both impact computational costs and reduce latency for users,” Apple’s machine learning researchers conclude. “With ReDrafter’s novel approach to speculative decoding integrated into the NVIDIA TensorRT-LLM framework, developers can now benefit from faster token generation on NVIDIA GPUs for their production LLM applications.”
蘋果機器學習研究人員總結道:“法學碩士越來越多地用於為生產應用程式提供支持,提高推理效率既可以影響計算成本,又可以減少用戶的延遲。” 「透過將 ReDrafter 新穎的推測性解碼方法整合到 NVIDIA TensorRT-LLM 框架中,開發人員現在可以在 NVIDIA GPU 上為其生產 LLM 應用程式更快地生成令牌,從而受益。”
You can learn more about this work on Apple’s website and in a blog post on NVIDIA’s website.
您可以在 Apple 網站和 NVIDIA 網站上的部落格文章中了解有關這項工作的更多資訊。
免責聲明:info@kdj.com
The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!
If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.
-
- 卡爾達諾(ADA)鯨魚行為數據揭示了市場的優柔寡斷立場
- 2024-12-19 10:45:01
- 卡爾達諾(ADA)過去幾週波動劇烈,反映出更廣泛市場的不可預測趨勢和精明資金的戰略佈局。
-
- 聯準會主席鮑威爾削弱比特幣漲勢,稱央行「無意」儲存加密資產
- 2024-12-19 10:45:01
- 聯準會主席鮑威爾週三表示,美國央行無意參與政府儲存大量比特幣的任何行動。
-
- 聯準會發出警訊後,比特幣創下三個多月來最大跌幅
- 2024-12-19 10:45:01
- 週三超過 5% 的跌幅將最大的數位資產推向 10 萬美元,這是一個備受關注的水平。
-
- XRP 價格預測:隨著代幣飆升至 5.85 美元,需要關注的關鍵水平
- 2024-12-19 10:45:01
- 最近的 XRP 價格走勢再次顯著飆升,繼 12 月飆升 9% 後再次短暫升至略高於 2.7 美元