|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
如果您透過 BGR 連結購買,我們可能會賺取附屬佣金,幫助支持我們的專家產品實驗室。 Apple 和 NVIDIA 分享了合作細節
Tech giants Apple and NVIDIA have joined forces to enhance the performance of Large Language Models (LLMs) by introducing a new text generation technique for AI.
科技巨頭 Apple 和 NVIDIA 聯手透過引入新的 AI 文字生成技術來增強大型語言模型 (LLM) 的效能。
According to Apple, accelerating LLM inference is a crucial ML research problem. This is because auto-regressive token generation is computationally expensive and relatively slow. As a result, improving inference efficiency can reduce latency for users.
Apple 表示,加速 LLM 推理是一個至關重要的 ML 研究問題。這是因為自回歸令牌產生的計算成本較高且相對較慢。因此,提高推理效率可以減少使用者的延遲。
In addition to ongoing efforts to accelerate inference on Apple silicon, the company has recently made significant progress in accelerating LLM inference for the NVIDIA GPUs widely used for production applications across the industry, the company writes in a research paper.
該公司在一份研究論文中寫道,除了持續努力加速 Apple 晶片上的推理之外,該公司最近還在加速廣泛用於整個行業生產應用的 NVIDIA GPU 的 LLM 推理方面取得了重大進展。
Earlier this year, Apple published and open-sourced Recurrent Drafter (ReDrafter), which is a novel approach to speculative decoding that “achieves state of the art performance.”
今年早些時候,Apple 發布並開源了 Recurrent Drafter (ReDrafter),這是一種新穎的推測解碼方法,「實現了最先進的性能」。
According to the company, ReDrafter uses an RNN draft model, and combines beam search with dynamic tree attention to speed up LLM token generation by up to 3.5 tokens per generation step for open source models, surpassing the performance of prior speculative decoding techniques.
據該公司稱,ReDrafter 使用RNN 草案模型,並將波束搜尋與動態樹注意力相結合,將開源模型的LLM 令牌生成速度提高到每生成步驟3.5 個令牌,超越了先前的推測解碼技術的性能。
“In benchmarking a tens-of-billions parameter production model on NVIDIA GPUs, using the NVIDIA TensorRT-LLM inference acceleration framework with ReDrafter, we have seen 2.7x speed-up in generated tokens per second for greedy decoding,” Apple papers show.
蘋果論文顯示:“在 NVIDIA GPU 上對數百億個參數生產模型進行基準測試時,使用 NVIDIA TensorRT-LLM 推理加速框架和 ReDrafter,我們發現每秒生成的貪婪解碼令牌速度提高了 2.7 倍。”
With that, this technology could signifanctly reduce latency users may experience, while also using fewer GPUs and consuming less power.
這樣,該技術可以顯著減少用戶可能遇到的延遲,同時使用更少的 GPU 並消耗更少的電量。
免責聲明:info@kdj.com
The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!
If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.
-
- Qubetics、比特幣與以太坊:重塑加密貨幣格局
- 2024-12-21 02:55:01
- 加密貨幣市場繼續以驚人的速度發展,其特點是突破性的創新、法律糾紛和價格波動。
-
- 下一個爆炸性的加密貨幣:發現過去 24 小時內主導漲幅榜的前 5 種加密貨幣
- 2024-12-21 02:45:01
- 在多頭市場中獲得可觀的投資回報率可能意味著關注具有爆炸潛力的模因幣和山寨幣。