$105934.412154 USD

-1.13%

ethereum

$2417.793426 USD

-2.67%

tether

$1.000178 USD

-0.04%

xrp

$2.175463 USD

-2.34%

bnb

$648.672170 USD

-1.30%

solana

$148.415216 USD

-3.22%

usd-coin

$0.999918 USD

0.00%

tron

$0.279962 USD

0.10%

dogecoin

$0.158426 USD

-3.57%

cardano

$0.545242 USD

-3.75%

hyperliquid

$37.164839 USD

-5.57%

bitcoin-cash

$500.991506 USD

-3.84%

sui

$2.691702 USD

-2.95%

chainlink

$12.870241 USD

-2.85%

unus-sed-leo

$8.947008 USD

-1.67%

암호화폐 뉴스 기사

보상 유도 투기 디코딩 : 효율적인 LLM 추론을위한 새로운 패러다임

2025/02/15 03:44

최근 몇 년 동안 LLMS (Large Language Models)의 빠른 스케일링은 자연어 이해 및 추론 능력의 특별한 개선으로 이어졌습니다.

Salesforce AI Research has introduced Reward-Guided Speculative Decoding (RSD), a novel framework for efficient inference in large language models (LLMs). The approach aims to strike a balance between speed and performance, addressing the computational challenges faced by LLMs during sequential token generation.

Salesforce AI Research는 대형 언어 모델 (LLMS)에서 효율적인 추론을위한 새로운 프레임 워크 인 RSD (Reward-Guided Presculative Decoding)를 도입했습니다. 이 접근법은 속도와 성능 사이의 균형을 맞추고 순차적 토큰 생성 동안 LLM이 직면 한 계산 문제를 해결하는 것을 목표로합니다.

At a Glance

한눈에

RSD combines a fast, lightweight “draft” model with a more robust “target” model.

RSD는 빠르고 가벼운 "초안"모델을보다 강력한 "대상"모델과 결합합니다.

A process reward model (PRM) evaluates draft model outputs in real time.

PRM (Process Reward Model)은 드래프트 모델 출력을 실시간으로 평가합니다.

RSD introduces a controlled bias to prioritize high-reward outputs.

RSD는 높은 보상 출력의 우선 순위를 정하기 위해 제어 된 바이어스를 도입합니다.

The approach enables “biased acceleration” and outperforms speculative decoding.

이 접근법은 "바이어스 가속"을 가능하게하고 투기 디코딩보다 성능이 우수합니다.

RSD achieves up to 4.4× faster inference and +3.5 average accuracy improvement.

RSD는 최대 4.4 × 더 빠른 추론과 +3.5 평균 정확도 개선을 달성합니다.

Technical Details and Benefits of RSD

RSD의 기술적 세부 사항 및 이점

Delving into the technical aspects, RSD operates by integrating two models in a sequential yet collaborative manner. Initially, the draft model produces candidate tokens or reasoning steps at a low computational cost. Each candidate is then evaluated using a reward function, which acts as a quality gate. If a candidate token’s reward exceeds a predetermined threshold, the output is accepted; if not, the system calls upon the more computationally intensive target model to generate a refined token. This process is guided by a weighting function—typically a binary step function—that adjusts the reliance on the draft versus the target model.

RSD는 기술적 측면을 탐구하면서 두 가지 모델을 순차적이지만 협력적인 방식으로 통합하여 운영됩니다. 처음에 초안 모델은 낮은 계산 비용으로 후보 토큰 또는 추론 단계를 생성합니다. 그런 다음 각 후보자는 품질 게이트 역할을하는 보상 기능을 사용하여 평가됩니다. 후보 토큰의 보상이 미리 결정된 임계 값을 초과하면 출력이 허용됩니다. 그렇지 않은 경우, 시스템은보다 계산 집약적 대상 모델을 요구하여 세련된 토큰을 생성하도록 요구합니다. 이 프로세스는 가중치 기능 (특히 이진 단계 함수)에 의해 안내되어 드래프트와 대상 모델에 대한 의존도를 조정합니다.

The dynamic quality control afforded by the process reward model (PRM) ensures that only the most promising outputs bypass the target model, thereby saving on computation. One of the standout benefits of this approach is “biased acceleration,” where the controlled bias is not a detriment but rather a strategic choice to prioritize high-reward outcomes. This results in two key benefits: first, the overall inference process can be up to 4.4× faster compared to running the target model alone; second, it often yields a +3.5 average accuracy improvement over conventional parallel decoding baselines. In essence, RSD harmonizes efficiency with accuracy—allowing for a substantial reduction in the number of floating-point operations (FLOPs) while still delivering outputs that meet or even exceed the performance of the target model. The theoretical underpinnings and algorithmic details, such as the mixture distribution defined by PRSD and the adaptive acceptance criterion, provide a robust framework for practical deployment in diverse reasoning tasks.

PRM (Process Reward Model)이 제공하는 동적 품질 관리는 가장 유망한 출력 만 대상 모델을 우회하도록하여 계산을 절약 할 수 있도록합니다. 이 접근법의 눈에 띄는 이점 중 하나는 통제 된 바이어스가 손해가 아니라 높은 보상 결과를 우선시하는 전략적 선택 인 "바이어스 가속"입니다. 이로 인해 두 가지 주요 이점이 발생합니다. 첫째, 전체 추론 프로세스는 대상 모델 만 실행하는 것보다 최대 4.4 × 더 빠를 수 있습니다. 둘째, 종종 기존의 병렬 디코딩 기준선에 비해 +3.5 평균 정확도 개선을 산출합니다. 본질적으로, RSD는 정확도로 효율성을 조화시켜 플로팅 포인트 작업 (FLOP)의 수를 상당히 줄이려면 대상 모델의 성능을 충족하거나 초과하는 출력을 제공합니다. PRSD에 의해 정의 된 혼합 분포 및 적응 형 수용 기준과 같은 이론적 토대 및 알고리즘 세부 사항은 다양한 추론 작업에서 실제 배치를위한 강력한 프레임 워크를 제공합니다.

Insights

통찰력

The empirical validation of RSD is compelling. Experiments detailed in the paper demonstrate that, on challenging benchmarks such as GSM8K, MATH50K, OlympiadBench, and GPQA, RSD consistently delivers superior performance. For instance, on the MATH500 benchmark—a dataset designed to test mathematical reasoning—RSD achieved an accuracy of 88.0 when configured with a 72B target model and a 7B PRM, compared to 85.6 for the target model running alone. Not only does this configuration reduce the computational load by nearly 4.4× fewer FLOPs, but it also enhances reasoning accuracy. The results underscore the potential of RSD to outperform traditional methods, such as speculative decoding (SD) and even advanced search-based techniques like beam search or Best-of-N strategies.

RSD의 경험적 검증은 강력합니다. 이 논문에 자세히 설명 된 실험은 GSM8K, Math50K, Olympiadbench 및 GPQA와 같은 도전적인 벤치 마크에서 RSD가 지속적으로 우수한 성능을 제공 함을 보여줍니다. 예를 들어, Math500 벤치 마크 (수학적 추론을 테스트하도록 설계된 데이터 세트)에서 RSD는 72B 대상 모델과 7B PRM으로 구성된 대상 모델의 경우 85.6에 비해 88.0의 정확도를 달성했습니다. 이 구성은 계산 부하를 거의 4.4 × 적은 플로그로 줄일뿐만 아니라 추론 정확도를 향상시킵니다. 결과는 SD (Speculative Decoding) 및 빔 검색 또는 최고의 N 전략과 같은 고급 검색 기반 기술과 같은 전통적인 방법을 능가 할 수있는 RSD의 잠재력을 강조합니다.

Conclusion: A New Paradigm for Efficient LLM Inference

결론 : 효율적인 LLM 추론을위한 새로운 패러다임

In conclusion, Reward-Guided Speculative Decoding (RSD) marks a significant milestone in the quest for more efficient LLM inference. By intelligently combining a lightweight draft model with a powerful target model, and by introducing a reward-based acceptance criterion, RSD effectively addresses the dual challenges of computational cost and output quality. The innovative approach of biased acceleration allows the system to selectively bypass expensive computations for high-reward outputs, thereby streamlining the inference process. The dynamic quality control mechanism—anchored by a process reward model—ensures that computational resources are allocated judiciously, engaging the target model only when necessary. With empirical results showing up to 4.4× faster inference and an average accuracy improvement of +3.5 over traditional methods, RSD not only paves the way for more scalable LLM deployments but also sets a new standard in the design of hybrid decoding frameworks.

결론적으로, 보상 유도 투기 디코딩 (RSD)은보다 효율적인 LLM 추론을 추구하는 데 중요한 이정표를 표시합니다. RSD는 경량 드래프트 모델과 강력한 대상 모델을 지능적으로 결합하고 보상 기반 수락 기준을 도입함으로써 계산 비용 및 출력 품질의 이중 문제를 효과적으로 해결합니다. 바이어스 된 가속의 혁신적인 접근 방식은 시스템이 높은 보상 출력에 대한 비싼 계산을 선택적으로 우회하여 추론 프로세스를 간소화 할 수있게합니다. 프로세스 보상 모델에 의해 선행 된 동적 품질 관리 메커니즘은 계산 자원이 신중하게 할당되어 필요할 때만 대상 모델을 참여 시킨다는 확정입니다. RSD는 전통적인 방법에 비해 최대 4.4 × 더 빠른 추론과 평균 정확도 개선이 +3.5의 평균 정확도 개선을 보여 주면서보다 확장 가능한 LLM 배포를위한 길을 열어 줄뿐만 아니라 하이브리드 디코딩 프레임 워크 설계에서 새로운 표준을 설정합니다.

Check out the Paper and GitHub Page. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

종이와 Github 페이지를 확인하십시오. 이 연구에 대한 모든 크레딧은이 프로젝트의 연구원들에게 전달됩니다. 또한 트위터에서 우리를 팔로우하고 75k+ ml 하위 레드에 가입하는 것을 잊지 마십시오.

🚨 Recommended Open-Source AI Platform: ‘IntellAgent is a An Open-Source Multi-Agent Framework to Evaluate Complex Conversational AI System’ (Promoted)

🚨 권장 오픈 소스 AI 플랫폼 : 'Intellagent는 복잡한 대화 AI 시스템을 평가하기위한 오픈 소스 다중 에이전트 프레임 워크입니다'(프로모션)

부인 성명:info@kdj.com

제공된 정보는 거래 조언이 아닙니다. kdj.com은 이 기사에 제공된 정보를 기반으로 이루어진 투자에 대해 어떠한 책임도 지지 않습니다. 암호화폐는 변동성이 매우 높으므로 철저한 조사 후 신중하게 투자하는 것이 좋습니다!

2025年07月03日 에 게재된 다른 기사

더