市值: $3.2442T 0.380%
成交额(24h): $98.8533B 0.250%
  • 市值: $3.2442T 0.380%
  • 成交额(24h): $98.8533B 0.250%
  • 恐惧与贪婪指数:
  • 市值: $3.2442T 0.380%
加密货币
话题
百科
资讯
加密话题
视频
热门新闻
加密货币
话题
百科
资讯
加密话题
视频
bitcoin
bitcoin

$97526.112350 USD

0.66%

ethereum
ethereum

$2710.567365 USD

0.31%

xrp
xrp

$2.768985 USD

8.28%

tether
tether

$1.000110 USD

0.00%

solana
solana

$196.035651 USD

-0.12%

bnb
bnb

$658.120584 USD

-1.60%

usd-coin
usd-coin

$1.000012 USD

-0.01%

dogecoin
dogecoin

$0.276466 USD

5.29%

cardano
cardano

$0.797528 USD

-0.06%

tron
tron

$0.233113 USD

0.57%

chainlink
chainlink

$19.423416 USD

3.49%

avalanche
avalanche

$26.420701 USD

2.87%

stellar
stellar

$0.353632 USD

4.81%

sui
sui

$3.453367 USD

-0.88%

shiba-inu
shiba-inu

$0.000017 USD

2.24%

加密货币新闻

奖励指导的投机解码:有效LLM推理的新范式

2025/02/15 03:44

近年来,大型语言模型(LLM)的快速缩放导致了自然语言理解和推理能力的极大改善。

奖励指导的投机解码:有效LLM推理的新范式

Salesforce AI Research has introduced Reward-Guided Speculative Decoding (RSD), a novel framework for efficient inference in large language models (LLMs). The approach aims to strike a balance between speed and performance, addressing the computational challenges faced by LLMs during sequential token generation.

Salesforce AI Research推出了奖励指导的投机解码(RSD),这是一个新型的大型语言模型(LLMS)有效推断的框架。该方法旨在在速度和性能之间取得平衡,以解决顺序代币生成期间LLM面临的计算挑战。

At a Glance

一目了然

RSD combines a fast, lightweight “draft” model with a more robust “target” model.

RSD将快速,轻巧的“草稿”模型与更强大的“目标”模型相结合。

A process reward model (PRM) evaluates draft model outputs in real time.

过程奖励模型(PRM)实时评估模型输出草案。

RSD introduces a controlled bias to prioritize high-reward outputs.

RSD引入了受控偏置,以优先考虑高回报输出。

The approach enables “biased acceleration” and outperforms speculative decoding.

该方法可以实现“偏见的加速度”,并且表现优于投机解码。

RSD achieves up to 4.4× faster inference and +3.5 average accuracy improvement.

RSD可实现高达4.4倍的推理速度和+3.5平均精度提高。

Technical Details and Benefits of RSD

RSD的技术细节和好处

Delving into the technical aspects, RSD operates by integrating two models in a sequential yet collaborative manner. Initially, the draft model produces candidate tokens or reasoning steps at a low computational cost. Each candidate is then evaluated using a reward function, which acts as a quality gate. If a candidate token’s reward exceeds a predetermined threshold, the output is accepted; if not, the system calls upon the more computationally intensive target model to generate a refined token. This process is guided by a weighting function—typically a binary step function—that adjusts the reliance on the draft versus the target model.

深入研究技术方面,RSD通过以连续但协作的方式整合两个模型来运行。最初,草案模型以低计算成本生成候选令牌或推理步骤。然后,使用奖励功能对每个候选人进行评估,该奖励功能充当质量门。如果候选人令牌的奖励超过了预定的阈值,则将接受输出;如果没有,该系统呼吁更计算密集型的目标模型生成精致的令牌。该过程以加权函数(通常是二进制步骤函数)为指导,该功能调整了对草稿与目标模型的依赖。

The dynamic quality control afforded by the process reward model (PRM) ensures that only the most promising outputs bypass the target model, thereby saving on computation. One of the standout benefits of this approach is “biased acceleration,” where the controlled bias is not a detriment but rather a strategic choice to prioritize high-reward outcomes. This results in two key benefits: first, the overall inference process can be up to 4.4× faster compared to running the target model alone; second, it often yields a +3.5 average accuracy improvement over conventional parallel decoding baselines. In essence, RSD harmonizes efficiency with accuracy—allowing for a substantial reduction in the number of floating-point operations (FLOPs) while still delivering outputs that meet or even exceed the performance of the target model. The theoretical underpinnings and algorithmic details, such as the mixture distribution defined by PRSD and the adaptive acceptance criterion, provide a robust framework for practical deployment in diverse reasoning tasks.

过程奖励模型(PRM)提供的动态质量控制可确保只有最有希望的输出绕过目标模型,从而节省了计算。这种方法的杰出好处之一是“偏见的加速度”,其中受控的偏见不是损害的,而是优先考虑高额结果的战略选择。这会带来两个关键的好处:首先,与仅运行目标模型相比,总体推理过程可以快4.4倍。其次,与常规平行解码基线相比,它通常会产生+3.5平均精度提高。从本质上讲,RSD将效率与准确性保持一致,这可以大大减少浮点操作数量(FLOP),同时仍交付满足甚至超过目标模型性能的输出。理论的基础和算法细节,例如PRSD定义的混合分布和自适应接受标准,为在各种推理任务中实际部署的实用框架提供了强大的框架。

Insights

见解

The empirical validation of RSD is compelling. Experiments detailed in the paper demonstrate that, on challenging benchmarks such as GSM8K, MATH50K, OlympiadBench, and GPQA, RSD consistently delivers superior performance. For instance, on the MATH500 benchmark—a dataset designed to test mathematical reasoning—RSD achieved an accuracy of 88.0 when configured with a 72B target model and a 7B PRM, compared to 85.6 for the target model running alone. Not only does this configuration reduce the computational load by nearly 4.4× fewer FLOPs, but it also enhances reasoning accuracy. The results underscore the potential of RSD to outperform traditional methods, such as speculative decoding (SD) and even advanced search-based techniques like beam search or Best-of-N strategies.

RSD的经验验证令人信服。本文中详细介绍的实验表明,在诸如GSM8K,Math50k,OlympiaDbench和GPQA之类的具有挑战性的基准上,RSD始终提供出色的性能。例如,在Math500基准(旨在测试数学推理的数据集)上,RSD在配置使用72B目标模型和7B PRM时的精度为88.0,而单独运行的目标模型为85.6。这种配置不仅使计算负载减少了近4.4×较少的拖鞋,而且还提高了推理精度。结果强调了RSD胜过传统方法的潜力,例如投机解码(SD),甚至基于先进的基于搜索的技术,例如Beam Search或Best-N策略。

Conclusion: A New Paradigm for Efficient LLM Inference

结论:有效LLM推理的新范式

In conclusion, Reward-Guided Speculative Decoding (RSD) marks a significant milestone in the quest for more efficient LLM inference. By intelligently combining a lightweight draft model with a powerful target model, and by introducing a reward-based acceptance criterion, RSD effectively addresses the dual challenges of computational cost and output quality. The innovative approach of biased acceleration allows the system to selectively bypass expensive computations for high-reward outputs, thereby streamlining the inference process. The dynamic quality control mechanism—anchored by a process reward model—ensures that computational resources are allocated judiciously, engaging the target model only when necessary. With empirical results showing up to 4.4× faster inference and an average accuracy improvement of +3.5 over traditional methods, RSD not only paves the way for more scalable LLM deployments but also sets a new standard in the design of hybrid decoding frameworks.

总之,奖励指导的投机解码(RSD)标志着寻求更有效的LLM推论的重要里程碑。通过将轻巧的草稿模型与强大的目标模型结合在一起,并通过引入基于奖励的接受标准有效地解决了计算成本和产出质量的双重挑战。偏置加速度的创新方法使系统能够选择性地绕过高回报输出的昂贵计算,从而简化推理过程。动态质量控制机制(由流程奖励模型托造)确保了计算资源被明智地分配,仅在必要时才能与目标模型进行参与。随着经验结果显示高达4.4倍的推理速度高达4.4倍,并且比传统方法的平均准确性提高+3.5,RSD不仅为更可扩展的LLM部署铺平了道路,而且还为混合解码框架设计设计了新的标准。

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

查看纸张和GitHub页面。这项研究的所有信用都归该项目的研究人员。另外,请随时在Twitter上关注我们,不要忘记加入我们的75K+ ML Subreddit。

🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System’ (Promoted)

🚨推荐开源AI平台:“ Intellagent是一个开源的多代理框架,可评估复杂的对话AI系统”(已晋升)

免责声明:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research!

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

2025年02月15日 发表的其他文章