|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
人工智慧視訊生成是一項計算密集型任務,通常涉及對大型時空空間進行建模。傳統方法通常需要
A new AI video generation model, Pyramid Flow, was released this week, offering high-quality video clips up to 10 seconds in length — quickly, and all open source.
本週發布了新的 AI 視訊生成模型 Pyramid Flow,可快速提供長達 10 秒的高品質視訊剪輯,並且全部開源。
Developed by a collaboration of researchers from Peking University, Beijing University of Posts and Telecommunications, and Kuaishou Technology — the latter the creator of the well-reviewed proprietary Kling AI video generator — Pyramid Flow leverages a new technique wherein a single AI model generates video in stages, most of them low resolution, saving only a full-res version for the end of its generation process.
Pyramid Flow 由北京大學、北京郵電大學和快手科技(後者是廣受好評的專有 Kling AI 視頻生成器的創建者)的研究人員合作開發,它利用了一種新技術,其中單個 AI 模型在階段,其中大多數是低解析度的,僅在生成過程結束時保存全解析度版本。
It’s available as raw code for download on Hugging Face and Github, and can be run in an inference shell here but requires the user to download and run the model code on their own machine.
它可以作為原始程式碼在 Hugging Face 和 Github 上下載,並且可以在此處的推理 shell 中運行,但需要使用者在自己的電腦上下載並運行模型程式碼。
At inference, the model can generate a 5-second, 384p video in just 56 seconds—on par with or faster than many full-sequence diffusion counterparts — though Runway’s Gen 3-Alpha Turbo still takes cake in terms of speed of AI video generation, coming in at under one minute and often times 10-20 seconds in our tests.
據推斷,該模型可以在短短 56 秒內生成 5 秒的 384p 視頻,與許多全序列擴散模型相當或更快,儘管 Runway 的 Gen 3-Alpha Turbo 在 AI 視頻生成速度方面仍然領先,在我們的測試中不到一分鐘,通常需要10-20 秒。
We haven’t had a chance to test Pyramid Flow yet, but the videos posted by the model creators appear to be incredibly lifelike, high enough resolution, and compelling — analogous to those of proprietary offerings. You can see various examples here on its Github project page.
我們還沒有機會測試 Pyramid Flow,但模型創建者發布的影片似乎非常逼真、解析度足夠高且引人注目 - 類似於專有產品的影片。您可以在其 Github 專案頁面上查看各種範例。
Indeed, Pyramid Flow is available designed now to download and use — even for commercial/enterprise purposes — and is designed to compete directly with paid proprietary offerings such as Runway’s Gen-3 Alpha, Luma’s Dream Machine, Kling, and Haulio, which can cost hundreds of even thousands of dollars a year for users on unlimited generation subscriptions.
事實上,Pyramid Flow 現在可供下載和使用——甚至可以用於商業/企業目的——並且旨在與付費專有產品直接競爭,例如Runway 的Gen-3 Alpha、Luma 的Dream Machine、Kling 和Haulio ,這些產品的成本可能很高。
As the race between various AI video providers to gain users continues, Pyramid Flow aims to bring more efficiency and flexibility to developers, artists, and creators seeking advanced video generation capabilities.
隨著各種人工智慧視訊供應商之間爭奪用戶的競賽仍在繼續,Pyramid Flow 旨在為尋求高級視訊生成功能的開發人員、藝術家和創作者帶來更高的效率和靈活性。
A new technique for high-quality AI videos: ‘pyramidal flow matching’
高品質AI視訊新技術:“金字塔流匹配”
AI video generation is a computationally intensive task that typically involves modeling large spatiotemporal spaces. Traditional methods often require separate models for different stages of the process, which limits flexibility and increases the complexity of training.
人工智慧視訊生成是一項計算密集型任務,通常涉及對大型時空空間進行建模。傳統方法通常需要針對過程的不同階段使用單獨的模型,這限制了靈活性並增加了訓練的複雜性。
Pyramid Flow is built on the concept of pyramidal flow matching, a method that drastically cuts down the computational cost of video generation while maintaining high visual quality, completing the video generation process as a series of “pyramid” stages, with only the final stage operating at full resolution.
Pyramid Flow 建立在金字塔流匹配的概念之上,該方法可大幅降低視頻生成的計算成本,同時保持較高的視覺質量,將視頻生成過程作為一系列“金字塔”階段完成,僅最後一個階段運行以全解析度。
It’s described in a pre-reviewed paper, “Pyramidal Flow Matching for Efficient Video Generative Modeling,” submitted to open access science journal arXiv on October 8, 2024.
它在 2024 年 10 月 8 日提交給開放獲取科學期刊 arXiv 的預審論文“用於高效視頻生成建模的金字塔流匹配”中進行了描述。
The authors include Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong Mu, and Zhouchen Lin. Most of these researchers are affiliated with Peking University, while others are from Kuaishou Technology.
The authors include Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong Mu, and Zhouchen Lin. Most of these researchers are affiliated with Peking University, while others from Kuishou Technology .
As they write, the ability to compress and optimize video generation at different stages leads to faster convergence during training, allowing Pyramid Flow to generate more samples per training batch.
正如他們所寫,在不同階段壓縮和優化視訊生成的能力可以加快訓練過程中的收斂速度,從而使 Pyramid Flow 在每個訓練批次中產生更多樣本。
For example, the proposed pyramidal flow reduces the token count by a factor of four compared to traditional diffusion models, which results in more efficient training.
例如,與傳統的擴散模型相比,所提出的金字塔流將令牌數量減少了四倍,從而提高了訓練效率。
The model can produce 5- to 10-second videos at 768p resolution and 24 frames per second, all while being trained on open-source datasets. Specifically, the paper states that Pyramid Flow was trained on trained on:
該模型可以生成 768p 解析度和每秒 24 幀的 5 到 10 秒視頻,同時在開源資料集上進行訓練。具體來說,論文指出 Pyramid Flow 接受了以下訓練:
In total, the authors curated approximately 10 million single-shot videos.
作者總共策劃了大約 1000 萬個單鏡頭影片。
However, many of these “public” or “open source” datasets have in recent years come under fire from critics for including copyrighted material without permission or informed consent of the copyright holders, and LAION-5B in particular accused of hosting child sexual abuse material.
然而,近年來,許多這些「公共」或「開源」資料集因未經版權所有者許可或知情同意而包含受版權保護的資料而受到批評者的批評,尤其是LAION-5B 被指控託管兒童性虐待材料。
Separately, Runway is among the companies being sued by artists in a class action lawsuit for training on materials without permission, compensation, or consent — allegedly in violation of U.S. copyright. The case remains being argued in court, for now.
另外,Runway 是因未經許可、補償或同意而在未經許可、補償或同意的情況下對材料進行培訓而被藝術家提起集體訴訟的公司之一,據稱侵犯了美國版權。目前,該案仍在法庭上進行辯論。
Permissively licensed, open source for commercial usage
許可許可,開源用於商業用途
Pyramid Flow is released under the MIT License, allowing for a wide range of uses, including commercial applications, modifications, and redistribution, provided the copyright notice is preserved.
Pyramid Flow 根據 MIT 授權發布,允許廣泛使用,包括商業應用、修改和重新分發,前提是保留版權聲明。
This makes Pyramid Flow an attractive option for developers and companies looking to integrate the model into proprietary systems, and could challenge Luma AI and Runway as both look to offer paid application programming interfaces for developers seeking to integrate their proprietary AI video generation technology into customer or employee-facing apps.
這使得Pyramid Flow 對於希望將該模型整合到專有系統中的開發人員和公司來說成為一個有吸引力的選擇,並且可能會挑戰Luma AI 和Runway,因為兩者都希望為尋求將其專有AI 視訊生成技術整合到客戶或客戶中的開發人員提供付費應用程式介面。
Yet those proprietary models already exist as inferences suitable for developers, while Pyramid Flow has a demo inference on Hugging Face, it is not suitable for building full applications atop it and users would need to host their own version of an inference, which
然而,這些專有模型已經作為適合開發人員的推理而存在,而 Pyramid Flow 在 Hugging Face 上有一個演示推理,它不適合在其上構建完整的應用程序,用戶需要託管自己的推理版本,這
免責聲明:info@kdj.com
所提供的資訊並非交易建議。 kDJ.com對任何基於本文提供的資訊進行的投資不承擔任何責任。加密貨幣波動性較大,建議您充分研究後謹慎投資!
如果您認為本網站使用的內容侵犯了您的版權,請立即聯絡我們(info@kdj.com),我們將及時刪除。
-
- 在 Doge2014 預售結束前以巨額獎金、獎勵和史詩般的空投來慶祝狗狗幣
- 2024-10-11 04:26:03
- Doge2014 旨在慶祝狗狗幣(市值 152 億美元的貨幣)在全球取得巨大成功。
-
- Pepe Unchained ($PEPU):Meme 代幣即將爆炸並達到 250 美元的市值
- 2024-10-11 04:25:18
- 最近,迷因幣的成功正在上升,特別是因為分析師不斷預測投資者興奮地等待牛市。
-
- 由於通膨擔憂和對加密貨幣的監管打擊,比特幣跌破 5.9 萬美元
- 2024-10-11 04:25:02
- Uniswap 的 UNI 代幣是過去 24 小時內唯一上漲的 CoinDesk 20 成分股。
-
- 隨著新年的臨近,Meme幣之間的戰鬥愈演愈烈
- 2024-10-11 04:25:02
- Dogen、NEIRO 和 POPCAT 正在引起加密世界的關注。每一個都提供不同的東西,並為投資者帶來希望。