$97526.112350 USD

0.66%

ethereum

$2710.567365 USD

0.31%

xrp

$2.768985 USD

8.28%

tether

$1.000110 USD

0.00%

solana

$196.035651 USD

-0.12%

bnb

$658.120584 USD

-1.60%

usd-coin

$1.000012 USD

-0.01%

dogecoin

$0.276466 USD

5.29%

cardano

$0.797528 USD

-0.06%

tron

$0.233113 USD

0.57%

chainlink

$19.423416 USD

3.49%

avalanche

$26.420701 USD

2.87%

stellar

$0.353632 USD

4.81%

sui

$3.453367 USD

-0.88%

shiba-inu

$0.000017 USD

2.24%

暗号通貨のニュース記事

報酬誘導投機的デコード：効率的なLLM推論のための新しいパラダイム

2025/02/15 03:44

近年、大規模な言語モデル（LLMS）の迅速なスケーリングにより、自然言語の理解と推論能力の並外れた改善が生じています。

Salesforce AI Research has introduced Reward-Guided Speculative Decoding (RSD), a novel framework for efficient inference in large language models (LLMs). The approach aims to strike a balance between speed and performance, addressing the computational challenges faced by LLMs during sequential token generation.

Salesforce AI Researchは、大規模な言語モデル（LLMS）における効率的な推論のための新しいフレームワークである報酬誘導投機デコード（RSD）を導入しました。このアプローチは、速度とパフォーマンスのバランスをとることを目的としており、シーケンシャルトークン生成中にLLMが直面する計算上の課題に対処することです。

At a Glance

一目で

RSD combines a fast, lightweight “draft” model with a more robust “target” model.

RSDは、高速で軽量の「ドラフト」モデルと、より堅牢な「ターゲット」モデルを組み合わせています。

A process reward model (PRM) evaluates draft model outputs in real time.

プロセス報酬モデル（PRM）は、ドラフトモデル出力をリアルタイムで評価します。

RSD introduces a controlled bias to prioritize high-reward outputs.

RSDは、高報酬の出力に優先順位を付けるために制御されたバイアスを導入します。

The approach enables “biased acceleration” and outperforms speculative decoding.

このアプローチにより、「偏った加速」が可能になり、投機的なデコードよりも優れています。

RSD achieves up to 4.4× faster inference and +3.5 average accuracy improvement.

RSDは、最大4.4×より速い推論と+3.5の平均精度の改善を達成します。

Technical Details and Benefits of RSD

RSDの技術的な詳細と利点

Delving into the technical aspects, RSD operates by integrating two models in a sequential yet collaborative manner. Initially, the draft model produces candidate tokens or reasoning steps at a low computational cost. Each candidate is then evaluated using a reward function, which acts as a quality gate. If a candidate token’s reward exceeds a predetermined threshold, the output is accepted; if not, the system calls upon the more computationally intensive target model to generate a refined token. This process is guided by a weighting function—typically a binary step function—that adjusts the reliance on the draft versus the target model.

RSDは、技術的な側面を掘り下げて、2つのモデルを連続的でありながら協力的な方法で統合することで動作します。当初、ドラフトモデルは、候補のトークンまたは推論ステップを低い計算コストで生成します。次に、各候補者は、品質ゲートとして機能する報酬関数を使用して評価されます。候補者のトークンの報酬が所定のしきい値を超えた場合、出力は受け入れられます。そうでない場合、システムは、より計算的に集中的なターゲットモデルを呼び、洗練されたトークンを生成します。このプロセスは、ドラフトとターゲットモデルへの依存度を調整する重み関数（同様にバイナリステップ関数）によって導かれます。

The dynamic quality control afforded by the process reward model (PRM) ensures that only the most promising outputs bypass the target model, thereby saving on computation. One of the standout benefits of this approach is “biased acceleration,” where the controlled bias is not a detriment but rather a strategic choice to prioritize high-reward outcomes. This results in two key benefits: first, the overall inference process can be up to 4.4× faster compared to running the target model alone; second, it often yields a +3.5 average accuracy improvement over conventional parallel decoding baselines. In essence, RSD harmonizes efficiency with accuracy—allowing for a substantial reduction in the number of floating-point operations (FLOPs) while still delivering outputs that meet or even exceed the performance of the target model. The theoretical underpinnings and algorithmic details, such as the mixture distribution defined by PRSD and the adaptive acceptance criterion, provide a robust framework for practical deployment in diverse reasoning tasks.

プロセス報酬モデル（PRM）が提供する動的品質制御により、最も有望な出力のみがターゲットモデルをバイパスし、それにより計算を節約できます。このアプローチの傑出した利点の1つは、「偏った加速」です。ここでは、制御されたバイアスは不利益ではなく、高報酬の結果に優先順位を付けるための戦略的選択です。これにより、2つの重要な利点が得られます。まず、全体的な推論プロセスは、ターゲットモデルのみを実行するのと比較して最大4.4倍高速になる可能性があります。第二に、多くの場合、従来の並列デコードベースラインよりも+3.5の平均精度改善が得られます。本質的に、RSDは効率を精度で調和させます。これは、ターゲットモデルのパフォーマンスを満たすか、それを超える出力を提供しながら、浮動小数点操作の数（flops）の大幅な削減を許可します。 PRSDによって定義された混合分布や適応容認基準などの理論的基盤とアルゴリズムの詳細は、多様な推論タスクにおける実用的な展開のための堅牢なフレームワークを提供します。

Insights

洞察

The empirical validation of RSD is compelling. Experiments detailed in the paper demonstrate that, on challenging benchmarks such as GSM8K, MATH50K, OlympiadBench, and GPQA, RSD consistently delivers superior performance. For instance, on the MATH500 benchmark—a dataset designed to test mathematical reasoning—RSD achieved an accuracy of 88.0 when configured with a 72B target model and a 7B PRM, compared to 85.6 for the target model running alone. Not only does this configuration reduce the computational load by nearly 4.4× fewer FLOPs, but it also enhances reasoning accuracy. The results underscore the potential of RSD to outperform traditional methods, such as speculative decoding (SD) and even advanced search-based techniques like beam search or Best-of-N strategies.

RSDの経験的検証は説得力があります。論文で詳述されている実験は、GSM8K、Math50K、Olympiadbench、GPQAなどの挑戦的なベンチマークで、RSDが一貫して優れたパフォーマンスを提供することを示しています。たとえば、Math500ベンチマーク（数学的推論をテストするために設計されたデータセット）で、RSDは、単独で実行されるターゲットモデルの85.6と比較して、72Bターゲットモデルと7B PRMで構成された場合、88.0の精度を達成しました。この構成は、計算負荷をほぼ4.4倍少ないフロップで削減するだけでなく、推論の精度も向上させます。結果は、投機的デコード（SD）やBeam SearchやBest-of-N Strategiesなどの高度な検索ベースの手法など、従来の方法を上回るRSDの可能性を強調しています。

Conclusion: A New Paradigm for Efficient LLM Inference

結論：効率的なLLM推論のための新しいパラダイム

In conclusion, Reward-Guided Speculative Decoding (RSD) marks a significant milestone in the quest for more efficient LLM inference. By intelligently combining a lightweight draft model with a powerful target model, and by introducing a reward-based acceptance criterion, RSD effectively addresses the dual challenges of computational cost and output quality. The innovative approach of biased acceleration allows the system to selectively bypass expensive computations for high-reward outputs, thereby streamlining the inference process. The dynamic quality control mechanism—anchored by a process reward model—ensures that computational resources are allocated judiciously, engaging the target model only when necessary. With empirical results showing up to 4.4× faster inference and an average accuracy improvement of +3.5 over traditional methods, RSD not only paves the way for more scalable LLM deployments but also sets a new standard in the design of hybrid decoding frameworks.

結論として、報酬ガイド付き投機的デコード（RSD）は、より効率的なLLM推論を求めて重要なマイルストーンをマークします。軽量ドラフトモデルと強力なターゲットモデルをインテリジェントに組み合わせ、報酬ベースの受け入れ基準を導入することにより、RSDは計算コストと出力品質の二重の課題に効果的に対処します。偏った加速の革新的なアプローチにより、システムは高報酬の出力の高価な計算を選択的にバイパスし、それにより推論プロセスを合理化できます。プロセス報酬モデルで固定された動的品質制御メカニズムは、計算リソースが賢明に割り当てられ、必要な場合にのみターゲットモデルを関与させることを保証します。経験的な結果が最大4.4×より速い推論と従来の方法で+3.5の平均精度改善を示すため、RSDはよりスケーラブルなLLM展開の道を開くだけでなく、ハイブリッドデコードフレームワークの設計に新しい標準を設定します。

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

紙とgithubページをご覧ください。この研究のすべてのクレジットは、このプロジェクトの研究者に送られます。また、Twitterでお気軽にフォローしてください。75K+ ML SubredDitに参加することを忘れないでください。

🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System’ (Promoted)

commendedオープンソースAIプラットフォームを推奨：「Intellagentは、複雑な会話型AIシステムを評価するためのオープンソースマルチエージェントフレームワークです」（プロモーション）

免責事項:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research！

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

2025年02月15日に掲載されたその他の記事

もっと