|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
人工智能视频生成是一项计算密集型任务,通常涉及对大型时空空间进行建模。传统方法通常需要
A new AI video generation model, Pyramid Flow, was released this week, offering high-quality video clips up to 10 seconds in length — quickly, and all open source.
本周发布了新的 AI 视频生成模型 Pyramid Flow,可快速提供长达 10 秒的高质量视频剪辑,并且全部开源。
Developed by a collaboration of researchers from Peking University, Beijing University of Posts and Telecommunications, and Kuaishou Technology — the latter the creator of the well-reviewed proprietary Kling AI video generator — Pyramid Flow leverages a new technique wherein a single AI model generates video in stages, most of them low resolution, saving only a full-res version for the end of its generation process.
Pyramid Flow 由北京大学、北京邮电大学和快手科技(后者是广受好评的专有 Kling AI 视频生成器的创建者)的研究人员合作开发,它利用了一种新技术,其中单个 AI 模型在阶段,其中大多数是低分辨率的,仅在生成过程结束时保存全分辨率版本。
It’s available as raw code for download on Hugging Face and Github, and can be run in an inference shell here but requires the user to download and run the model code on their own machine.
它可以作为原始代码在 Hugging Face 和 Github 上下载,并且可以在此处的推理 shell 中运行,但需要用户在自己的计算机上下载并运行模型代码。
At inference, the model can generate a 5-second, 384p video in just 56 seconds—on par with or faster than many full-sequence diffusion counterparts — though Runway’s Gen 3-Alpha Turbo still takes cake in terms of speed of AI video generation, coming in at under one minute and often times 10-20 seconds in our tests.
据推断,该模型可以在短短 56 秒内生成 5 秒的 384p 视频,与许多全序列扩散模型相当或更快,尽管 Runway 的 Gen 3-Alpha Turbo 在 AI 视频生成速度方面仍然领先,在我们的测试中不到一分钟,通常需要 10-20 秒。
We haven’t had a chance to test Pyramid Flow yet, but the videos posted by the model creators appear to be incredibly lifelike, high enough resolution, and compelling — analogous to those of proprietary offerings. You can see various examples here on its Github project page.
我们还没有机会测试 Pyramid Flow,但模型创建者发布的视频似乎非常逼真、分辨率足够高且引人注目 - 类似于专有产品的视频。您可以在其 Github 项目页面上查看各种示例。
Indeed, Pyramid Flow is available designed now to download and use — even for commercial/enterprise purposes — and is designed to compete directly with paid proprietary offerings such as Runway’s Gen-3 Alpha, Luma’s Dream Machine, Kling, and Haulio, which can cost hundreds of even thousands of dollars a year for users on unlimited generation subscriptions.
事实上,Pyramid Flow 现在可供下载和使用——甚至可以用于商业/企业目的——并且旨在与付费专有产品直接竞争,例如 Runway 的 Gen-3 Alpha、Luma 的 Dream Machine、Kling 和 Haulio,这些产品的成本可能很高。对于无限代订阅的用户来说,每年数百甚至数千美元。
As the race between various AI video providers to gain users continues, Pyramid Flow aims to bring more efficiency and flexibility to developers, artists, and creators seeking advanced video generation capabilities.
随着各种人工智能视频提供商之间争夺用户的竞赛仍在继续,Pyramid Flow 旨在为寻求高级视频生成功能的开发人员、艺术家和创作者带来更高的效率和灵活性。
A new technique for high-quality AI videos: ‘pyramidal flow matching’
高质量AI视频新技术:“金字塔流匹配”
AI video generation is a computationally intensive task that typically involves modeling large spatiotemporal spaces. Traditional methods often require separate models for different stages of the process, which limits flexibility and increases the complexity of training.
人工智能视频生成是一项计算密集型任务,通常涉及对大型时空空间进行建模。传统方法通常需要针对过程的不同阶段使用单独的模型,这限制了灵活性并增加了训练的复杂性。
Pyramid Flow is built on the concept of pyramidal flow matching, a method that drastically cuts down the computational cost of video generation while maintaining high visual quality, completing the video generation process as a series of “pyramid” stages, with only the final stage operating at full resolution.
Pyramid Flow 建立在金字塔流匹配的概念之上,该方法可大幅降低视频生成的计算成本,同时保持较高的视觉质量,将视频生成过程作为一系列“金字塔”阶段完成,仅最后一个阶段运行以全分辨率。
It’s described in a pre-reviewed paper, “Pyramidal Flow Matching for Efficient Video Generative Modeling,” submitted to open access science journal arXiv on October 8, 2024.
它在 2024 年 10 月 8 日提交给开放获取科学期刊 arXiv 的预审论文“用于高效视频生成建模的金字塔流匹配”中进行了描述。
The authors include Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong Mu, and Zhouchen Lin. Most of these researchers are affiliated with Peking University, while others are from Kuaishou Technology.
The authors include Yang Jin, Zhicheng Sun, Ningyuan Li, Kun Xu, Hao Jiang, Nan Zhuang, Quzhe Huang, Yang Song, Yadong Mu, and Zhouchen Lin. Most of these researchers are affiliated with Peking University, while others are from Kuaishou Technology.
As they write, the ability to compress and optimize video generation at different stages leads to faster convergence during training, allowing Pyramid Flow to generate more samples per training batch.
正如他们所写,在不同阶段压缩和优化视频生成的能力可以加快训练过程中的收敛速度,从而使 Pyramid Flow 在每个训练批次中生成更多样本。
For example, the proposed pyramidal flow reduces the token count by a factor of four compared to traditional diffusion models, which results in more efficient training.
例如,与传统的扩散模型相比,所提出的金字塔流将令牌数量减少了四倍,从而提高了训练效率。
The model can produce 5- to 10-second videos at 768p resolution and 24 frames per second, all while being trained on open-source datasets. Specifically, the paper states that Pyramid Flow was trained on trained on:
该模型可以生成 768p 分辨率和每秒 24 帧的 5 到 10 秒视频,同时在开源数据集上进行训练。具体来说,该论文指出 Pyramid Flow 接受了以下训练:
In total, the authors curated approximately 10 million single-shot videos.
作者总共策划了大约 1000 万个单镜头视频。
However, many of these “public” or “open source” datasets have in recent years come under fire from critics for including copyrighted material without permission or informed consent of the copyright holders, and LAION-5B in particular accused of hosting child sexual abuse material.
然而,近年来,许多这些“公共”或“开源”数据集因未经版权所有者许可或知情同意而包含受版权保护的材料而受到批评者的批评,尤其是 LAION-5B 被指控托管儿童性虐待材料。
Separately, Runway is among the companies being sued by artists in a class action lawsuit for training on materials without permission, compensation, or consent — allegedly in violation of U.S. copyright. The case remains being argued in court, for now.
另外,Runway 是因未经许可、补偿或同意而在未经许可、补偿或同意的情况下对材料进行培训而被艺术家提起集体诉讼的公司之一,据称侵犯了美国版权。目前,该案仍在法庭上进行辩论。
Permissively licensed, open source for commercial usage
许可许可,开源用于商业用途
Pyramid Flow is released under the MIT License, allowing for a wide range of uses, including commercial applications, modifications, and redistribution, provided the copyright notice is preserved.
Pyramid Flow 根据 MIT 许可证发布,允许广泛使用,包括商业应用、修改和重新分发,前提是保留版权声明。
This makes Pyramid Flow an attractive option for developers and companies looking to integrate the model into proprietary systems, and could challenge Luma AI and Runway as both look to offer paid application programming interfaces for developers seeking to integrate their proprietary AI video generation technology into customer or employee-facing apps.
这使得 Pyramid Flow 对于希望将该模型集成到专有系统中的开发人员和公司来说成为一个有吸引力的选择,并且可能会挑战 Luma AI 和 Runway,因为两者都希望为寻求将其专有 AI 视频生成技术集成到客户或客户中的开发人员提供付费应用程序编程接口。面向员工的应用程序。
Yet those proprietary models already exist as inferences suitable for developers, while Pyramid Flow has a demo inference on Hugging Face, it is not suitable for building full applications atop it and users would need to host their own version of an inference, which
然而,这些专有模型已经作为适合开发人员的推理而存在,而 Pyramid Flow 在 Hugging Face 上有一个演示推理,它不适合在其上构建完整的应用程序,用户需要托管自己的推理版本,这
免责声明:info@kdj.com
所提供的信息并非交易建议。根据本文提供的信息进行的任何投资,kdj.com不承担任何责任。加密货币具有高波动性,强烈建议您深入研究后,谨慎投资!
如您认为本网站上使用的内容侵犯了您的版权,请立即联系我们(info@kdj.com),我们将及时删除。
-
- 铂金的独特属性
- 2024-10-11 02:15:16
- 一些金属,如黄金,其价值源自其作为货币的历史作用。有些是纯工业金属,具有独特的物理特性,例如钨和钛
-
- 比特币鲸鱼投资者在抛售大量比特币后面临重大损失
- 2024-10-11 02:15:01
- 比特币鲸鱼投资者在过去 24 小时内抛售大量持有的 BTC 后,面临重大损失。
-
- 随着中国政府开始抛售另外价值 1.3B 美元的 ETH,以太坊面临更大的抛售压力
- 2024-10-11 02:15:01
- 随着中国政府开始抛售价值 13 亿美元的以太坊,以太坊市场面临着越来越大的抛售压力。