![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
![]() |
|
这种方法以OpenAI O1和DeepSeek R1等模型为例,可以通过在推断过程中利用其他计算资源来增强模型性能。
Recent advancements in AI scaling laws have shifted from merely increasing model size and training data to optimising inference-time computation. This approach, exemplified by models like OpenAI’s GPT-4 and DeepMind's DeepSeek R1, enhances model performance by leveraging additional computational resources during inference.
AI缩放定律的最新进展已从仅增加模型大小和训练数据转变为优化推理时间计算。这种方法以OpenAI的GPT-4和DeepMind的DeepSeek R1为例,可以通过在推理过程中利用其他计算资源来增强模型性能。
Test-time budget forcing has emerged as an efficient technique in large language models (LLMs), enabling improved performance with minimal token generation. Similarly, inference-time scaling has gained traction in diffusion models, particularly in reward-based sampling, where iterative refinement helps generate outputs that better align with user preferences. This method is crucial for text-to-image generation, where naïve sampling often fails to fully capture intricate specifications, such as object relationships and logical constraints.
在大型语言模型(LLMS)中,测试时间预算强迫已成为一种有效的技术,从而可以通过最少的代币产生提高性能。同样,推理时间缩放在扩散模型中也获得了吸引力,尤其是在基于奖励的采样中,迭代改进有助于生成更好地与用户偏好保持一致的输出。此方法对于文本到图像生成至关重要,在文本到图像生成中,幼稚的采样通常无法完全捕获复杂的规范,例如对象关系和逻辑约束。
Inference-time scaling methods for diffusion models can be broadly categorized into fine-tuning-based and particle-sampling approaches. Fine-tuning improves model alignment with specific tasks but requires retraining for each use case, limiting scalability. In contrast, particle sampling—used in techniques like SVDD and CoDe—selects high-reward samples iteratively during denoising, significantly improving output quality.
扩散模型的推理时间缩放方法可以广泛地分为基于微调的基于微调和粒子采样方法。微调可以通过特定任务改善模型对齐方式,但需要为每种用例进行重新调整,从而限制可扩展性。相比之下,粒子采样(以SVDD和代码等技术为单位)在deNoSing期间迭代地选择了高奖励样品,从而显着提高了输出质量。
While these methods have been effective for diffusion models, their application to flow models has been limited due to the deterministic nature of their generation process. Recent work, including SoP, has introduced stochasticity to flow models, enabling particle sampling-based inference-time scaling. This study expands on such efforts by modifying the reverse kernel, further enhancing sampling diversity and effectiveness in flow-based generative models.
尽管这些方法对于扩散模型有效,但由于其生成过程的确定性性质,它们在流模型上的应用受到限制。包括SOP在内的最新工作将随机性引入了流模型,从而实现了基于粒子采样的推理时间缩放。这项研究通过修改反向内核,进一步增强了基于流量的生成模型的采样多样性和有效性,从而扩展了这种工作。
Researchers from KAIST propose an inference-time scaling method for pretrained flow models, addressing their limitations in particle sampling due to a deterministic generative process. They introduce three key innovations:
来自KAIST的研究人员提出了一种预审预学流程模型的推理时间缩放方法,解决了由于确定性生成过程而导致的粒子采样局限性。他们介绍了三个关键创新:
1. SDE-based generation to enable stochastic sampling.
1。基于SDE的生成以实现随机抽样。
2. VP interpolant conversion for enhancing sample diversity.
2。VP插值转换,以增强样品多样性。
3. Rollover Budget Forcing (RBF) for adaptive computational resource allocation.
3。自适应计算资源分配的滚动预算强迫(RBF)。
Experimental results on compositional text-to-image generation tasks with FLUX, a pretrained flow model, demonstrate that these techniques effectively improve reward alignment. The proposed approach outperforms prior methods, showcasing the advantages of inference-time scaling in flow models, particularly when combined with gradient-based techniques for differentiable rewards like aesthetic image generation.
验证的流量模型对使用Flux的组成文本到图像生成任务的实验结果表明,这些技术有效地改善了奖励比对。所提出的方法的表现优于先前的方法,展示了流程模型中推理时间缩放的优势,尤其是与基于梯度的技术结合使用,以获得诸如美学图像产生的可区分奖励时。
Inference-Time Reward Alignment in Pretrained Flow Models via Particle Sampling
通过粒子采样预审计的流动模型中的推理时间奖励对齐
通过粒子采样预审计的流动模型中的推理时间奖励对齐
The goal of inference-time reward alignment is to generate high-reward samples from a pretrained flow model without any retraining. The objective is defined as follows:
推理时间奖励对齐的目的是从预审计的流程模型中产生高奖励样本,而无需进行任何重新训练。该目标定义如下:
where R denotes the reward function and p(x) represents the original data distribution, which we aim to minimize using KL divergence to maintain image quality.
其中r表示奖励函数,而p(x)表示原始数据分布,我们旨在最大程度地减少KL差异以维持图像质量。
Since direct sampling from p(x) is challenging, the study adapts particle sampling techniques commonly used in diffusion models. However, flow models rely on deterministic sampling, limiting exploration in new directions despite high-reward samples being found in early iterations. To address this, the researchers introduce inference-time stochastic sampling by converting deterministic processes into stochastic ones.
由于P(X)的直接采样具有挑战性,因此该研究适应了扩散模型中常用的粒子采样技术。然而,尽管在早期迭代中发现了高奖励样本,但流程模型依赖于确定性抽样,将新方向的探索限制在新方向上。为了解决这个问题,研究人员通过将确定性过程转换为随机过程来引入推理时间随机抽样。
Moreover, they propose interpolant conversion to enlarge the search space and improve diversity by aligning flow model sampling with diffusion models. A dynamic compute allocation strategy is employed to enhance efficiency during inference-time scaling.
此外,他们提出了插值转换,以扩大搜索空间并通过将流程模型采样与扩散模型对齐来改善多样性。采用动态计算分配策略来提高推理时间缩放期间的效率。
The study presents experimental results on particle sampling methods for inference-time reward alignment, focusing on compositional text-to-image and quantity-aware image generation tasks using FLUX as the pretrained flow model. Metrics such as VQAScore and RSS are used to assess alignment and accuracy.
该研究提出了针对推理时间奖励比对的粒子采样方法的实验结果,重点是使用通量作为预审计的流动模型的组成文本对图像和数量感知的图像生成任务。 VQASCORE和RSS等指标用于评估对齐和准确性。
The results indicate that inference-time stochastic sampling improves efficiency, and interpolant conversion further enhances performance. Flow-based particle sampling yields high-reward outputs compared to diffusion models without compromising image quality. The proposed RBF method effectively optimizes budget allocation, achieving the best reward alignment and accuracy results.
结果表明,推理时间随机采样提高效率,插值转化进一步提高了性能。与扩散模型相比,基于流动的粒子采样可产生高回报的输出,而不会损害图像质量。拟议的RBF方法有效地优化了预算分配,实现了最佳的奖励一致性和准确性结果。
Qualitative and quantitative findings confirm the effectiveness of this approach in generating precise, high-quality images.
定性和定量发现证实了这种方法在产生精确的高质量图像中的有效性。
In summary, this research introduces an inference-time scaling method for flow models, incorporating three key innovations:
总而言之,这项研究引入了流动模型的推理时间缩放方法,结合了三个关键创新:
1. ODE-to-SDE conversion for enabling particle sampling.
1。启用粒子采样的ODE到SDE转换。
2. Linear-to-VP interpolant conversion to enhance diversity and search efficiency.
2。线性到VP插值转换,以提高多样性和搜索效率。
3. RBF for adaptive compute allocation.
3。用于自适应计算分配的RBF。
While diffusion models benefit from stochastic sampling during denoising, flow models require tailored approaches due to their deterministic nature. The proposed VP-SDE-based generation effectively integrates particle sampling, and RBF optimizes compute usage. Experimental results demonstrate that this method surpasses existing inference-time scaling techniques, improving performance while maintaining high-quality outputs in flow-based image and video generation models.
尽管扩散模型受益于在脱诺过程中的随机采样,但由于其确定性性质,流程模型需要量身定制的方法。提出的基于VP-SDE的生成有效地整合了粒子采样,RBF优化了计算用法。实验结果表明,这种方法超过了现有的推理时间缩放技术,在维持基于流的图像和视频生成模型中保持高质量输出的同时提高了性能。
免责声明:info@kdj.com
所提供的信息并非交易建议。根据本文提供的信息进行的任何投资,kdj.com不承担任何责任。加密货币具有高波动性,强烈建议您深入研究后,谨慎投资!
如您认为本网站上使用的内容侵犯了您的版权,请立即联系我们(info@kdj.com),我们将及时删除。
-
- 系绳和微观环境领导比特币购买了近91781 BTC,Q1 2025
- 2025-04-02 20:45:12
- 然而,比特币崩溃了12%!同时,长期持有人倾倒了178000 BTC,将比特币的价格推向下降。
-
-
-
- 加密货币市场可能在接下来的两个月中看到本地底部
- 2025-04-02 20:40:12
- 由于持续的进口关税谈判的全球不确定性,加密货币市场可能会在未来两个月内看到本地底部
-
-
- PEPE硬币(PEPE)价格预测:楔形突破跌落后可能有130%-140%的激增
- 2025-04-02 20:35:12
- Pepe硬币价格目前正在稍作校正,同时测试楔形模式下的关键阻力水平。
-
- 在特朗普的解放日之前,加密市场今天发出混杂的信号
- 2025-04-02 20:30:12
- 在特朗普解放日之前,加密货币市场今天发出了混杂的信号,比特币价格略有恢复,盘旋接近8.5万美元
-
-