|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
过去几年出现的许多突破性人工智能(AI)应用的成功归功于称为序列模型的广泛算法。
Sequence models have played a crucial role in the development of several groundbreaking artificial intelligence (AI) applications in recent years. For instance, the algorithms that power popular large language models like Llama, ChatGPT, and Gemini belong to a specific category of sequence models that perform next-token (or word) prediction.
近年来,序列模型在多个突破性人工智能(AI)应用的开发中发挥了至关重要的作用。例如,为 Llama、ChatGPT 和 Gemini 等流行的大型语言模型提供支持的算法属于执行下一个标记(或单词)预测的特定序列模型类别。
Text-to-video tools, such as Sora, are also based on sequence models, but in these cases the models used can predict the full sequence of a result, not just the next token.
文本到视频工具(例如 Sora)也基于序列模型,但在这些情况下,使用的模型可以预测结果的完整序列,而不仅仅是下一个标记。
Traditionally, sequence models built for next-token prediction can generate sequences of variable lengths but struggle with long-term planning. On the other hand, full-sequence models excel at long-term planning but are limited to fixed-length input and output sequences. This leaves both classes of models with their own set of trade-offs, each leaving something different to be desired.
传统上,为下一个令牌预测构建的序列模型可以生成可变长度的序列,但难以进行长期规划。另一方面,全序列模型擅长长期规划,但仅限于固定长度的输入和输出序列。这使得两类模型都有自己的一套权衡,每种模型都有不同的需要改进的地方。
Researchers at MIT CSAIL and the Technical University of Munich have proposed a novel approach called Diffusion Forcing to combine the strengths of both next-token and full-sequence models. This technique improves both the quality and adaptability of sequence models.
麻省理工学院 CSAIL 和慕尼黑工业大学的研究人员提出了一种称为扩散强迫的新颖方法,以结合下一个令牌模型和全序列模型的优点。该技术提高了序列模型的质量和适应性。
At its core, Diffusion Forcing builds on "Teacher Forcing," which simplifies sequence generation into smaller, manageable steps by predicting one token at a time. Diffusion Forcing introduces the concept of "fractional masking," where noise is added to the data in varying amounts, mimicking the process of partially obscuring or masking tokens. The model is then trained to remove this noise and predict the next few tokens, allowing it to simultaneously handle denoising and future predictions. This method makes the model highly adaptable to tasks involving noisy or incomplete data, enabling it to generate precise, stable outputs.
从本质上讲,扩散强迫建立在“教师强迫”的基础上,“教师强迫”通过一次预测一个标记,将序列生成简化为更小的、可管理的步骤。扩散强迫引入了“分数掩蔽”的概念,其中噪声以不同的量添加到数据中,模仿部分模糊或掩蔽标记的过程。然后训练该模型以消除这种噪声并预测接下来的几个标记,使其能够同时处理去噪和未来预测。这种方法使模型高度适应涉及噪声或不完整数据的任务,使其能够生成精确、稳定的输出。
The researchers validated the Diffusion Forcing technique through a series of experiments in robotics and video generation. In one experiment, the team applied the method to a robotic arm tasked with swapping two toy fruits across three circular mats. Despite visual distractions like a shopping bag obstructing its view, the robotic arm successfully completed the task, demonstrating Diffusion Forcing’s ability to filter out noisy data and make reliable decisions.
研究人员通过一系列机器人和视频生成实验验证了扩散强迫技术。在一项实验中,研究小组将该方法应用于机械臂,该机械臂的任务是在三个圆形垫子上交换两个玩具水果。尽管存在购物袋等视觉干扰物遮挡视线,机械臂仍成功完成了任务,证明了扩散力过滤掉噪声数据并做出可靠决策的能力。
In another set of experiments, Diffusion Forcing was tested in video generation, where it was trained on gameplay footage from Minecraft and simulated environments in Google’s DeepMind Lab. Compared to traditional diffusion models and next-token models, Diffusion Forcing produced higher-resolution and more stable videos from single frames, even outperforming baselines that struggled to maintain coherence beyond 72 frames.
在另一组实验中,扩散强迫在视频生成中进行了测试,它在 Minecraft 的游戏片段和 Google DeepMind 实验室的模拟环境中进行了训练。与传统的扩散模型和下一个令牌模型相比,扩散强迫从单帧生成了更高分辨率和更稳定的视频,甚至优于难以保持超过 72 帧的一致性的基线。
Finally, in a maze-solving task, the method generated faster and more accurate plans than six baseline models, showcasing its potential for long-horizon tasks like motion planning in robotics.
最后,在迷宫求解任务中,该方法比六个基线模型生成更快、更准确的计划,展示了其在机器人运动规划等长视野任务中的潜力。
Overall, Diffusion Forcing provides a flexible framework for both long-term planning and variable-length sequence generation, making it valuable in diverse fields such as robotics, video generation, and AI planning. The technique's ability to handle uncertainty and adapt to new inputs could ultimately lead to advancements in how robots learn and perform complex tasks in unpredictable environments.
总体而言,扩散强迫为长期规划和可变长度序列生成提供了灵活的框架,使其在机器人、视频生成和人工智能规划等不同领域都很有价值。该技术处理不确定性和适应新输入的能力最终可能会导致机器人在不可预测的环境中学习和执行复杂任务的方式取得进步。
免责声明:info@kdj.com
所提供的信息并非交易建议。根据本文提供的信息进行的任何投资,kdj.com不承担任何责任。加密货币具有高波动性,强烈建议您深入研究后,谨慎投资!
如您认为本网站上使用的内容侵犯了您的版权,请立即联系我们(info@kdj.com),我们将及时删除。
-
- DOGE火箭式前进:着眼于突破0.17美元阻力位
- 2024-10-23 02:25:15
- 随着一种代币的失误导致其他代币的崛起,迷因代币的格局正在发生变化。当热门选手陷入困境时,意想不到的竞争者却一飞冲天。
-
- Scroll 推出原生代币 SCR,以表彰全球贡献者社区
- 2024-10-23 02:25:15
- Scroll 是以太坊领先的零知识汇总,今天宣布推出其原生代币 SCR。