$102650.959537 USD

0.04%

ethereum

$3143.610721 USD

-1.62%

xrp

$3.112987 USD

0.45%

tether

$0.999777 USD

-0.03%

solana

$233.280576 USD

-2.55%

bnb

$676.885796 USD

-0.27%

usd-coin

$1.000051 USD

0.01%

dogecoin

$0.331944 USD

-0.55%

cardano

$0.943614 USD

-0.83%

tron

$0.242693 USD

-1.73%

chainlink

$23.424739 USD

-3.22%

avalanche

$33.482250 USD

-1.59%

stellar

$0.401846 USD

-1.42%

toncoin

$4.873784 USD

-2.06%

hedera

$0.308794 USD

-2.26%

加密货币新闻

思想链：推理出现在语言模型中

2025/01/29 05:00

经过训练以表达扩展思维链的新模型将在其代码和数学的突破性领域之外概括。

This post is early to accommodate some last minute travel on my end!

这篇文章还早，可以容纳我尽头的最后一分钟旅行！

The new models trained to express extended chain of thought are going to generalize outside of their breakthrough domains of code and math. The “reasoning” process of language models that we use today is chain of thought reasoning. We ask the model to work step by step because it helps it manage complexity, especially in domains where the answer requires precision across multiple specific tokens. The domains where chain of thought (CoT) is most useful today are code, mathematics, and other “reasoning” tasks1. These are the domains where models like o1, R1, Gemini-Thinking, etc. were designed for.

经过训练以表达扩展思维链的新模型将在其代码和数学的突破性领域之外概括。我们今天使用的语言模型的“推理”过程是思想推理链。我们要求该模型逐步工作，因为它可以帮助其管理复杂性，尤其是在答案需要多个特定令牌的域中。当今最有用的思想链（COT）的域是代码，数学和其他“推理”任务1。这些是为O1，R1，Gemini-Inkining等设计的域名。

Different intelligences reason in different ways that correspond to how they store and manipulate information. Humans compress a lifetime of experience into our spectacular, low-power brains that draw on past experience almost magically. The words that follow in this blog are also autoregressive, like the output of a language model, but draw on hours and hours of background processing as I converge on this argument.

不同的智能理由以不同的方式与他们如何存储和操纵信息相对应。人类将一生的经验压缩到我们壮观的低功耗大脑中，这些大脑几乎神奇地借鉴了过去的经验。此博客中随之而来的单词也是自回归的，例如语言模型的输出，但是在我汇总该参数时，会在数小时的背景处理中使用。

Language models, on the other hand, are extremely general and do not today have architectures (or use-cases) that continually re-expose them to relevant problems and fold information back in a compressed form. Language models are very large, sophisticated, parametric probability distributions. All of their knowledge and information processing power is stored in the raw weights. Therein, they need a way of processing information that matches this. Chain of thought is that alignment.

另一方面，语言模型非常通用，并且今天没有架构（或用例），这些架构（或用例）不断地将其重新陈述为相关问题，并以压缩形式折叠信息。语言模型非常大，复杂，参数概率分布。他们所有的知识和信息处理能力都存储在原始权重中。在其中，他们需要一种处理与此相匹配的信息的方法。思想链就是那个对齐。

Chain of thought reasoning allows information to be naturally processed in smaller chunks, allowing the large, brute force probability distribution to work one token at a time. Chain of thought, while allowing more compute per important token, also allows the models to store intermediate information in their context window without needing explicit recurrence.

思想推理链允许在较小的块中自然处理信息，从而使较大的，蛮力的概率分布一次可以工作一个令牌。思想链，同时允许每个重要令牌进行更多计算，还允许模型将中间信息存储在其上下文窗口中，而无需明确的复发。

Recurrence is required for reasoning and this can either happen in the parameter or state-space. Chain of thoughts with transformers handles all of this in the state-space of the problems. The humans we look at as the most intelligent have embedded information directly in the parameters of our brains that we can draw on.

推理需要复发，这可以在参数或状态空间中发生。带有变形金刚的思想链在问题的状态空间中处理了所有这些。我们认为最聪明的人类将信息直接嵌入了我们可以吸引的大脑的参数中。

Here is the only assumption of this piece — chain of thought is a natural fit for language models to “reason” and therefore one should be optimistic about training methods that are designed to enhance it generalizing to many domains.2 By the end of 2025 we should have ample evidence of this given the pace of the technological development.

这是本文的唯一假设 - 思想链自然地适合“理性”，因此应该对旨在增强其推广到许多领域的培训方法乐观。2到2025年底，我们考虑到技术发展的速度，应该有足够的证据。

If the analogies of types of intelligence aren’t convincing enough, a far more practical way to view the new style of training is a method that teaches the model to be better at allocating more compute to harder problems. If the skill is compute allocation, it is fundamental to the models handling a variety of tasks. Today’s reasoning models do not solve this perfectly, but they open the door for doing so precisely.

如果类型的智力类型不够说服力，那么一种更实用的方式来查看新的培训方式是一种教会模型更好地分配更难解决问题的方法。如果技能是计算分配，则对于处理各种任务的模型至关重要。当今的推理模型并不能完美地解决这个问题，但是他们为这样做打开了大门。

The nature of this coming generalization is not that these models are one size fits all, best in all cases: speed, intelligence, price, etc. There’s still no free lunch. A realistic outcome for reasoning heavy models in the next 0-3 years is a world where:

即将到来的概括的性质并不是这些型号适合所有尺寸，在所有情况下最好：速度，智力，价格等。仍然没有免费的午餐。在未来0 - 3年中推理重型模型的现实结果是一个世界：

Reasoning trained models are superhuman on tasks with verifiable domains, like those with initial progress: Code, math, etc.

推理训练的模型是具有可验证域的任务的超人，例如具有最初进度的域：代码，数学等。

Reasoning trained models are well better in peak performance than existing autoregressive models in many domains we would not expect and are not necessarily verifiable.

在许多域中，经过推理训练的模型的峰值性能要比现有的自动回归模型要好，我们不会期望并且不一定是可验证的。

Reasoning trained models are still better in performance at the long-tail of tasks, but worse in cost given the high inference costs of long-context.

经过推理训练的模型在长期完成任务方面的性能仍然更好，但是鉴于长期延长的推理成本很高，成本较差。

Many of the leading figures in AI have been saying for quite some time that powerful AI is going to be “spikey" when it shows up — meaning that the capabilities and improvements will vary substantially across domains — but encountering this reality is very unintuitive.

AI中的许多领先人物已经说过一段时间以来，当它出现时强大的AI将会是“尖峰”的 - 这意味着能力和改进在跨领域之间的差异很大 - 但是遇到这一现实是非常不直觉的。

Some evidence for generalization of reasoning models already exists.

一些关于推理模型的概括的证据已经存在。

OpenAI has already published multiple safety-oriented research projects with their new reasoning models in Deliberative Alignment: Reasoning Enables Safer Language Models and Trading Inference-Time Compute for Adversarial Robustness. These papers show their new methods can be translated to various safety domains, i.e. model safety policies and jailbreaking. The deliberative alignment paper shows them integrating a softer reward signal into the reasoning training — having a language model check how the safety policies apply to outputs.

OpenAI已经在审议方面发表了多个面向安全的研究项目，其新的推理模型：推理使更安全的语言模型和交易推理时间计算以实现对抗性鲁棒性。这些论文表明，他们的新方法可以转化为各种安全域，即模型安全政策和越狱。审议对准文件显示他们将较软的奖励信号整合到推理培训中 - 具有语言模型检查安全策略如何适用于输出。

An unsurprising quote from the deliberative alignment release related to generalization:

与概括相关的审议一致性释放的毫不奇怪的报价：

we find that deliberative alignment enables strong generalization to out-of-distribution safety scenarios.

我们发现，审议对齐能够有力地概括分布安全的情况。

Safety, qualitatively, is very orthogonal to traditional reasoning problems. Safety is very subjective to the information provided and subtle context, where math and coding problems are often about many small, forward processing steps towards a final goal. More behaviors will fit in between those.

在定性上，安全性与传统推理问题非常矫正。安全对于提供的信息和微妙的环境非常主观，在这种情况下，数学和编码问题通常与许多针对最终目标的小型，前进的处理步骤有关。这些行为将适合这些行为。

This generative verifier for safety is not a ground truth signal and could theoretically be subject to reward hacking, but it was avoided. Generative verifiers will be crucial to expanding this training to countless domains — they’re easy to use and largely a new development

这种生成性验证者的安全性不是地面真相信号，从理论上讲，这可能会受到奖励黑客攻击，但可以避免。生成验证者对于将这种培训扩展到无数领域至关重要 - 它们易于使用，并且在很大程度上是一个新的开发项目

免责声明:info@kdj.com

The information provided is not trading advice. kdj.com does not assume any responsibility for any investments made based on the information provided in this article. Cryptocurrencies are highly volatile and it is highly recommended that you invest with caution after thorough research！

If you believe that the content used on this website infringes your copyright, please contact us immediately (info@kdj.com) and we will delete it promptly.

2025年01月30日发表的其他文章