市值: $3.2389T 0.080%
體積(24小時): $100.9381B 3.300%
  • 市值: $3.2389T 0.080%
  • 體積(24小時): $100.9381B 3.300%
  • 恐懼與貪婪指數:
  • 市值: $3.2389T 0.080%
Cryptos
主題
Cryptospedia
資訊
CryptosTopics
影片
Top News
Cryptos
主題
Cryptospedia
資訊
CryptosTopics
影片
bitcoin
bitcoin

$97526.112350 USD

0.66%

ethereum
ethereum

$2710.567365 USD

0.31%

xrp
xrp

$2.768985 USD

8.28%

tether
tether

$1.000110 USD

0.00%

solana
solana

$196.035651 USD

-0.12%

bnb
bnb

$658.120584 USD

-1.60%

usd-coin
usd-coin

$1.000012 USD

-0.01%

dogecoin
dogecoin

$0.276466 USD

5.29%

cardano
cardano

$0.797528 USD

-0.06%

tron
tron

$0.233113 USD

0.57%

chainlink
chainlink

$19.423416 USD

3.49%

avalanche
avalanche

$26.420701 USD

2.87%

stellar
stellar

$0.353632 USD

4.81%

sui
sui

$3.453367 USD

-0.88%

shiba-inu
shiba-inu

$0.000017 USD

2.24%

加密貨幣新聞文章

獎勵指導的投機解碼:有效LLM推理的新範式

2025/02/15 03:44

近年來,大型語言模型(LLM)的快速縮放導致了自然語言理解和推理能力的極大改善。

獎勵指導的投機解碼:有效LLM推理的新範式

Salesforce AI Research has introduced Reward-Guided Speculative Decoding (RSD), a novel framework for efficient inference in large language models (LLMs). The approach aims to strike a balance between speed and performance, addressing the computational challenges faced by LLMs during sequential token generation.

Salesforce AI Research推出了獎勵指導的投機解碼(RSD),這是一個新型的大型語言模型(LLMS)有效推斷的框架。該方法旨在在速度和性能之間取得平衡,以解決順序代幣生成期間LLM面臨的計算挑戰。

At a Glance

一目了然

RSD combines a fast, lightweight “draft” model with a more robust “target” model.

RSD將快速,輕巧的“草稿”模型與更強大的“目標”模型相結合。

A process reward model (PRM) evaluates draft model outputs in real time.

過程獎勵模型(PRM)實時評估模型輸出草案。

RSD introduces a controlled bias to prioritize high-reward outputs.

RSD引入了受控偏置,以優先考慮高回報輸出。

The approach enables “biased acceleration” and outperforms speculative decoding.

該方法可以實現“偏見的加速度”,並且表現優於投機解碼。

RSD achieves up to 4.4× faster inference and +3.5 average accuracy improvement.

RSD可實現高達4.4倍的推理速度和+3.5平均精度提高。

Technical Details and Benefits of RSD

RSD的技術細節和好處

Delving into the technical aspects, RSD operates by integrating two models in a sequential yet collaborative manner. Initially, the draft model produces candidate tokens or reasoning steps at a low computational cost. Each candidate is then evaluated using a reward function, which acts as a quality gate. If a candidate token’s reward exceeds a predetermined threshold, the output is accepted; if not, the system calls upon the more computationally intensive target model to generate a refined token. This process is guided by a weighting function—typically a binary step function—that adjusts the reliance on the draft versus the target model.

深入研究技術方面,RSD通過以連續但協作的方式整合兩個模型來運行。最初,草案模型以低計算成本生成候選令牌或推理步驟。然後,使用獎勵功能對每個候選人進行評估,該獎勵功能充當質量門。如果候選人令牌的獎勵超過了預定的閾值,則將接受輸出;如果沒有,該系統呼籲更計算密集型的目標模型生成精緻的令牌。該過程以加權函數(通常是二進制步驟函數)為指導,該功能調整了對草稿與目標模型的依賴。

The dynamic quality control afforded by the process reward model (PRM) ensures that only the most promising outputs bypass the target model, thereby saving on computation. One of the standout benefits of this approach is “biased acceleration,” where the controlled bias is not a detriment but rather a strategic choice to prioritize high-reward outcomes. This results in two key benefits: first, the overall inference process can be up to 4.4× faster compared to running the target model alone; second, it often yields a +3.5 average accuracy improvement over conventional parallel decoding baselines. In essence, RSD harmonizes efficiency with accuracy—allowing for a substantial reduction in the number of floating-point operations (FLOPs) while still delivering outputs that meet or even exceed the performance of the target model. The theoretical underpinnings and algorithmic details, such as the mixture distribution defined by PRSD and the adaptive acceptance criterion, provide a robust framework for practical deployment in diverse reasoning tasks.

過程獎勵模型(PRM)提供的動態質量控制可確保只有最有希望的輸出繞過目標模型,從而節省了計算。這種方法的傑出好處之一是“偏見的加速度”,其中受控的偏見不是損害的,而是優先考慮高額結果的戰略選擇。這會帶來兩個關鍵的好處:首先,與僅運行目標模型相比,總體推理過程可以快4.4倍。其次,與常規平行解碼基線相比,它通常會產生+3.5平均精度提高。從本質上講,RSD將效率與準確性保持一致,這可以大大減少浮點操作數量(FLOP),同時仍交付滿足甚至超過目標模型性能的輸出。理論的基礎和算法細節,例如PRSD定義的混合分佈和自適應接受標準,為在各種推理任務中實際部署的實用框架提供了強大的框架。

Insights

見解

The empirical validation of RSD is compelling. Experiments detailed in the paper demonstrate that, on challenging benchmarks such as GSM8K, MATH50K, OlympiadBench, and GPQA, RSD consistently delivers superior performance. For instance, on the MATH500 benchmark—a dataset designed to test mathematical reasoning—RSD achieved an accuracy of 88.0 when configured with a 72B target model and a 7B PRM, compared to 85.6 for the target model running alone. Not only does this configuration reduce the computational load by nearly 4.4× fewer FLOPs, but it also enhances reasoning accuracy. The results underscore the potential of RSD to outperform traditional methods, such as speculative decoding (SD) and even advanced search-based techniques like beam search or Best-of-N strategies.

RSD的經驗驗證令人信服。本文中詳細介紹的實驗表明,在諸如GSM8K,Math50k,OlympiaDbench和GPQA之類的具有挑戰性的基准上,RSD始終提供出色的性能。例如,在Math500基準(旨在測試數學推理的數據集)上,RSD在配置使用72B目標模型和7B PRM時的精度為88.0,而單獨運行的目標模型為85.6。這種配置不僅使計算負載減少了近4.4×較少的拖鞋,而且還提高了推理精度。結果強調了RSD勝過傳統方法的潛力,例如投機解碼(SD),甚至基於先進的基於搜索的技術,例如Beam Search或Best-N策略。

Conclusion: A New Paradigm for Efficient LLM Inference

結論:有效LLM推理的新範式

In conclusion, Reward-Guided Speculative Decoding (RSD) marks a significant milestone in the quest for more efficient LLM inference. By intelligently combining a lightweight draft model with a powerful target model, and by introducing a reward-based acceptance criterion, RSD effectively addresses the dual challenges of computational cost and output quality. The innovative approach of biased acceleration allows the system to selectively bypass expensive computations for high-reward outputs, thereby streamlining the inference process. The dynamic quality control mechanism—anchored by a process reward model—ensures that computational resources are allocated judiciously, engaging the target model only when necessary. With empirical results showing up to 4.4× faster inference and an average accuracy improvement of +3.5 over traditional methods, RSD not only paves the way for more scalable LLM deployments but also sets a new standard in the design of hybrid decoding frameworks.

總之,獎勵指導的投機解碼(RSD)標誌著尋求更有效的LLM推論的重要里程碑。通過將輕巧的草稿模型與強大的目標模型結合在一起,並通過引入基於獎勵的接受標準有效地解決了計算成本和產出質量的雙重挑戰。偏置加速度的創新方法使系統能夠選擇性地繞過高回報輸出的昂貴計算,從而簡化推理過程。動態質量控制機制(由流程獎勵模型托造)確保了計算資源被明智地分配,僅在必要時才能與目標模型進行參與。隨著經驗結果顯示高達4.4倍的推理速度高達4.4倍,並且比傳統方法的平均準確性提高+3.5,RSD不僅為更可擴展的LLM部署鋪平了道路,而且還為混合解碼框架設計設計了新的標準。

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

查看紙張和GitHub頁面。這項研究的所有信用都歸該項目的研究人員。另外,請隨時在Twitter上關注我們,不要忘記加入我們的75K+ ML Subreddit。

🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System’ (Promoted)

🚨推薦開源AI平台:“ Intellagent是一個開源的多代理框架,可評估複雜的對話AI系統”(已晉升)

免責聲明:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

2025年02月15日 其他文章發表於