$82951.790245 USD

-0.70%

ethereum

$1791.465527 USD

-1.83%

tether

$0.999717 USD

-0.01%

xrp

$2.055970 USD

0.14%

bnb

$593.238692 USD

-1.32%

usd-coin

$1.000032 USD

0.02%

solana

$115.381354 USD

-4.13%

dogecoin

$0.161732 USD

-2.67%

cardano

$0.649656 USD

-0.44%

tron

$0.239261 USD

1.04%

unus-sed-leo

$9.561241 USD

1.74%

toncoin

$3.530703 USD

-6.73%

chainlink

$12.739766 USD

-3.87%

stellar

$0.259841 USD

-2.48%

avalanche

$18.093210 USD

-3.52%

加密貨幣新聞文章

Tokenskip：優化大語言模型中的思想鏈處理

2025/02/23 08:50

儘管通過促進鏈（COT）提示取得的突破性進步，但大型語言模型（LLMS）在復雜的推理任務中面臨重大挑戰。

Large Language Models (LLMs) have made substantial progress in handling Chain-of-Thought (CoT) reasoning tasks, but they face challenges in terms of computational overhead, especially for longer CoT sequences. This directly affects inference latency and memory requirements.

大型語言模型（LLMS）在處理思想鏈（COT）推理任務方面取得了重大進展，但它們在計算開銷方面面臨挑戰，尤其是對於更長的COT序列。這直接影響推理潛伏期和內存需求。

Since LLM decoding is autoregressive in nature, as CoT sequences grow longer, there is a proportional increase in processing time and memory usage in attention layers where computational costs scale quadratically. Striking a balance between maintaining reasoning accuracy and computational efficiency has become a critical challenge, as attempts to reduce reasoning steps often compromise the model’s problem-solving capabilities.

由於LLM解碼本質上是自回歸的，隨著COT序列的增長更長，在註意層中的處理時間和記憶使用量會增加，其中計算成本量規則二次。在保持推理準確性和計算效率之間達到平衡已成為一個關鍵挑戰，因為試圖減少推理步驟通常會損害模型解決問題的能力。

To address the computational challenges of Chain-of-Thought (CoT) reasoning, various methodologies have been developed. Some approaches focus on streamlining the reasoning process by simplifying or skipping certain thinking steps, while others attempt to generate steps in parallel. A different strategy involves compressing reasoning steps into continuous latent representations, enabling LLMs to reason without generating explicit word tokens.

為了解決思想鏈（COT）推理的計算挑戰，已經開發了各種方法。一些方法著重於通過簡化或跳過某些思維步驟來簡化推理過程，而另一些方法則試圖並行生成步驟。另一種策略涉及將推理步驟壓縮到連續的潛在表示中，使LLMS能夠在不生成明確的單詞代幣的情況下進行推理。

Moreover, prompt compression techniques to handle complex instructions and long-context inputs more efficiently range from using lightweight language models to generate concise prompts, employing implicit continuous tokens for task representation, and implementing direct compression by filtering for high-informative tokens.

此外，迅速的壓縮技術以處理複雜的指令和長篇文章輸入更有效地範圍從使用輕巧的語言模型來生成簡潔提示，使用隱式連續令牌來完成任務表示，並通過過濾高信息令牌來實現直接壓縮。

In this work, researchers from The Hong Kong Polytechnic University and the University of Science and Technology of China propose TokenSkip, an approach to optimize CoT processing in LLMs. It enables models to skip less important tokens within CoT sequences while maintaining connections between critical reasoning tokens, with adjustable compression ratios.

在這項工作中，香港理工大學和中國科學技術大學的研究人員提出了Tokenskip，這是一種優化LLM中COT處理的方法。它使模型能夠在COT序列中跳過較少重要的令牌，同時保持關鍵推理令牌之間的連接，並具有可調節的壓縮比。

The system works by first constructing compressed CoT training data through token pruning, followed by a supervised fine-tuning process. Initial testing across multiple models, including LLaMA-3.1-8B-Instruct and Qwen2.5-Instruct series shows promising results, particularly in maintaining reasoning capabilities while significantly reducing computational overhead.

該系統通過首先通過令牌修剪構建壓縮的COT訓練數據，然後進行監督的微調過程。跨多個模型的初步測試，包括Llama-3.1-8b-Instruct和QWEN2.5-Insruction系列顯示出令人鼓舞的結果，尤其是在保持推理能力的同時大大減少計算開銷時。

The architecture of TokenSkip is built on the fundamental principle that different reasoning tokens contribute varying levels of importance to reaching the final answer. It consists of two main phases: training data preparation and inference.

Tokenskip的架構建立在以下基本原則的基礎上，即不同的推理令牌為達到最終答案的重要性有所不同。它由兩個主要階段組成：培訓數據準備和推理。

During the training phase, the system generates CoT trajectories using the target LLM, and each remaining trajectory is pruned with a randomly selected compression ratio. The token pruning process is guided by an “importance scoring” mechanism, which assigns higher scores to tokens that are more critical for the final answer.

在訓練階段，系統使用目標LLM生成COT軌跡，並且每個剩餘的軌跡都用隨機選擇的壓縮比修剪。令牌修剪過程以“重要性評分”機制為指導，該機制為最終答案更為重要的代幣分配了更高的分數。

At inference time, TokenSkip maintains the autoregressive decoding approach but enhances efficiency by enabling LLMs to skip less important tokens. The structure of the input format is such that the question and compression ratio get separated by end-of-sequence tokens.

在推論時，Tokenskip保持自回歸解碼方法，但通過使LLMS能夠跳過重要的令牌來提高效率。輸入格式的結構使得問題和壓縮比通過序列末端令牌分開。

The results show that larger language models are more capable of maintaining performance while achieving higher compression rates. The Qwen2.5-14B-Instruct model achieves remarkable results with only a 0.4% performance drop while reducing token usage by 40%.

結果表明，較大的語言模型更有能力保持性能，同時達到更高的壓縮率。 QWEN2.5-14B-INSTRUCT模型取得了顯著的結果，只有0.4％的性能下降，同時將令牌使用量減少了40％。

When compared with alternative approaches like prompt-based reduction and truncation, TokenSkip shows superior performance. While prompt-based reduction fails to achieve target compression ratios and truncation leads to significant performance degradation, TokenSkip maintains the specified compression ratio while preserving reasoning capabilities. On the MATH-500 dataset, it achieves a 30% reduction in token usage with less than a 4% performance drop.

與替代方法相比，諸如及時的減少和截斷之類的方法時，Tokenskip顯示出卓越的性能。儘管基於迅速的減少無法達到目標壓縮比和截斷會導致顯著的性能降解，但Tokenskip在保留推理能力的同時保持指定的壓縮比。在Math-500數據集上，它的令牌使用率降低了30％，性能下降不到4％。

In this paper, researchers introduce TokenSkip, a method that represents a significant advancement in optimizing CoT processing for LLMs by introducing a controllable compression mechanism based on token importance. The success of the method lies in maintaining reasoning accuracy while significantly reducing computational overhead by selectively preserving critical tokens and skipping less important ones. The approach has proven effective with LLMs, showing minimal performance degradation even at substantial compression ratios.

在本文中，研究人員介紹了Tokenskip，該方法代表了通過基於令牌重要性引入可控的壓縮機制來優化LLM的COT處理的重大進步。該方法的成功在於維持推理精度，同時通過選擇性地保留關鍵令牌和跳過較少重要的代幣來顯著降低計算開銷。該方法已被證明對LLM有效，即使以實質性的壓縮比也顯示出最小的性能降解。

This research opens new possibilities for advancing efficient reasoning in LLMs, establishing a foundation for future developments in computational efficiency while maintaining robust reasoning capabilities.

這項研究為推進LLM中有效推理的新可能性開闢了新的可能性，為計算效率的未來發展建立了基礎，同時保持了強大的推理能力。

Check out the Paper. All credit for this research goes to the researchers of this project. Also, feel free to follow us on Twitter and don’t forget to join our 75k+ ML SubReddit.

查看紙。這項研究的所有信用都歸該項目的研究人員。另外，請隨時在Twitter上關注我們，不要忘記加入我們的75K+ ML Subreddit。

免責聲明:info@kdj.com

所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大，建議您充分研究後謹慎投資！

如果您認為本網站使用的內容侵犯了您的版權，請立即聯絡我們（info@kdj.com），我們將及時刪除。

2025年04月04日其他文章發表於