|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
過去幾年出現的許多突破性人工智慧(AI)應用的成功歸功於稱為序列模型的廣泛演算法。
Sequence models have played a crucial role in the development of several groundbreaking artificial intelligence (AI) applications in recent years. For instance, the algorithms that power popular large language models like Llama, ChatGPT, and Gemini belong to a specific category of sequence models that perform next-token (or word) prediction.
近年來,序列模型在多個突破性人工智慧(AI)應用的開發中發揮了至關重要的作用。例如,為 Llama、ChatGPT 和 Gemini 等流行的大型語言模型提供支援的演算法屬於執行下一個標記(或單字)預測的特定序列模型類別。
Text-to-video tools, such as Sora, are also based on sequence models, but in these cases the models used can predict the full sequence of a result, not just the next token.
文字轉影片工具(例如 Sora)也基於序列模型,但在這些情況下,使用的模型可以預測結果的完整序列,而不僅僅是下一個標記。
Traditionally, sequence models built for next-token prediction can generate sequences of variable lengths but struggle with long-term planning. On the other hand, full-sequence models excel at long-term planning but are limited to fixed-length input and output sequences. This leaves both classes of models with their own set of trade-offs, each leaving something different to be desired.
傳統上,為下一個令牌預測建構的序列模型可以產生可變長度的序列,但難以進行長期規劃。另一方面,全序列模型擅長長期規劃,但僅限於固定長度的輸入和輸出序列。這使得兩類模型都有自己的一套權衡,每種模型都有不同的需要改進的地方。
Researchers at MIT CSAIL and the Technical University of Munich have proposed a novel approach called Diffusion Forcing to combine the strengths of both next-token and full-sequence models. This technique improves both the quality and adaptability of sequence models.
麻省理工學院 CSAIL 和慕尼黑工業大學的研究人員提出了一種稱為擴散強迫的新穎方法,以結合下一個令牌模型和全序列模型的優點。該技術提高了序列模型的品質和適應性。
At its core, Diffusion Forcing builds on "Teacher Forcing," which simplifies sequence generation into smaller, manageable steps by predicting one token at a time. Diffusion Forcing introduces the concept of "fractional masking," where noise is added to the data in varying amounts, mimicking the process of partially obscuring or masking tokens. The model is then trained to remove this noise and predict the next few tokens, allowing it to simultaneously handle denoising and future predictions. This method makes the model highly adaptable to tasks involving noisy or incomplete data, enabling it to generate precise, stable outputs.
從本質上講,擴散強迫建立在「教師強迫」的基礎上,「教師強迫」透過一次預測一個標記,將序列生成簡化為更小的、可管理的步驟。擴散強迫引入了「分數掩蔽」的概念,其中雜訊以不同的量添加到資料中,模仿部分模糊或掩蔽標記的過程。然後訓練該模型以消除這種噪音並預測接下來的幾個標記,使其能夠同時處理去噪和未來預測。這種方法使模型高度適應涉及雜訊或不完整資料的任務,使其能夠產生精確、穩定的輸出。
The researchers validated the Diffusion Forcing technique through a series of experiments in robotics and video generation. In one experiment, the team applied the method to a robotic arm tasked with swapping two toy fruits across three circular mats. Despite visual distractions like a shopping bag obstructing its view, the robotic arm successfully completed the task, demonstrating Diffusion Forcing’s ability to filter out noisy data and make reliable decisions.
研究人員透過一系列機器人和視訊生成實驗驗證了擴散強迫技術。在一項實驗中,研究小組將此方法應用於機械手臂,該機械手臂的任務是在三個圓形墊子上交換兩個玩具水果。儘管存在購物袋等視覺幹擾物遮擋視線,機械手臂仍成功完成了任務,證明了擴散力過濾掉噪音數據並做出可靠決策的能力。
In another set of experiments, Diffusion Forcing was tested in video generation, where it was trained on gameplay footage from Minecraft and simulated environments in Google’s DeepMind Lab. Compared to traditional diffusion models and next-token models, Diffusion Forcing produced higher-resolution and more stable videos from single frames, even outperforming baselines that struggled to maintain coherence beyond 72 frames.
在另一組實驗中,擴散強迫在視訊生成中進行了測試,它在 Minecraft 的遊戲片段和 Google DeepMind 實驗室的模擬環境中進行了訓練。與傳統的擴散模型和下一個令牌模型相比,擴散強迫從單幀生成了更高解析度和更穩定的視頻,甚至優於難以保持超過 72 幀的一致性的基線。
Finally, in a maze-solving task, the method generated faster and more accurate plans than six baseline models, showcasing its potential for long-horizon tasks like motion planning in robotics.
最後,在迷宮解算任務中,該方法比六個基準模型產生更快、更準確的計劃,展示了其在機器人運動規劃等長視野任務中的潛力。
Overall, Diffusion Forcing provides a flexible framework for both long-term planning and variable-length sequence generation, making it valuable in diverse fields such as robotics, video generation, and AI planning. The technique's ability to handle uncertainty and adapt to new inputs could ultimately lead to advancements in how robots learn and perform complex tasks in unpredictable environments.
總體而言,擴散強迫為長期規劃和可變長度序列生成提供了靈活的框架,使其在機器人、視訊生成和人工智慧規劃等不同領域都很有價值。該技術處理不確定性和適應新輸入的能力最終可能會導致機器人在不可預測的環境中學習和執行複雜任務的方式取得進展。
免責聲明:info@kdj.com
所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大,建議您充分研究後謹慎投資!
如果您認為本網站使用的內容侵犯了您的版權,請立即聯絡我們(info@kdj.com),我們將及時刪除。
-
- DOGE火箭式前進:著眼於突破0.17美元阻力位
- 2024-10-23 02:25:15
- 隨著一種代幣的失誤導致其他代幣的崛起,迷因代幣的格局正在改變。當熱門選手陷入困境時,意想不到的競爭者卻一飛沖天。
-
- Scroll 推出原生代幣 SCR,以表彰全球貢獻者社區
- 2024-10-23 02:25:15
- Scroll 是以太坊領先的零知識匯總,今天宣布推出其原生代幣 SCR。