具有任意长度的高功能视频制作传播模型 (Latent Video Diffusion Models for High-Fidelity Video Generation with Arbitrary Lengths)

AI-generated content has attracted lots of attention recently, but photo-realistic video synthesis is still challenging. Although many attempts using GANs and autoregressive models have been made in this area, the visual quality and length of generated videos are far from satisfactory. Diffusion models (DMs) are another class of deep generative models and have recently achieved remarkable performance on various image synthesis tasks. However, training image diffusion models usually requires substantial computational resources to achieve a high performance, which makes expanding diffusion models to high-dimensional video synthesis tasks more computationally expensive. To ease this problem while leveraging its advantages, we introduce lightweight video diffusion models that synthesize high-fidelity and arbitrary-long videos from pure noise. Specifically, we propose to perform diffusion and denoising in a low-dimensional 3D latent space, which significantly outperforms previous methods on 3D pixel space when under a limited computational budget. In addition, though trained on tens of frames, our models can generate videos with arbitrary lengths, i.e., thousands of frames, in an autoregressive way. Finally, conditional latent perturbation is further introduced to reduce performance degradation during synthesizing long-duration videos. Extensive experiments on various datasets and generated lengths suggest that our framework is able to sample much more realistic and longer videos than previous approaches, including GAN-based, autoregressive-based, and diffusion-based methods.

翻译：AI 生成的内容最近引起了许多关注,但照片现实的视频合成仍然具有挑战性。虽然许多使用GANs和自动递增模型的尝试在这一领域已经做出了许多尝试,但生成的视频的视觉质量和长度远非令人满意。Dubil 模型(DMs)是另一类深层基因模型,最近在各种图像合成任务上取得了显著的成绩。然而,培训图像传播模型通常需要大量的计算资源才能达到高性能,使推广模型向高维视频合成任务扩展的成本更高。为了在利用其优势的同时缓解这一问题,我们引入了轻量视频传播模型,这些模型综合了从纯噪音中产生的高不真实性和任意的递增视频。具体地说,我们建议在一个低维3D 3D 潜在空间进行传播模型的传播和分解,这些模型在有限的计算预算下大大超过3D ixel 空间的以往方法。此外,尽管经过数十个框架的培训,我们的模型可以产生任意的长度,即以千个框架为基础,以自动递增的方式生成视频。最后,有条件的潜潜透性透度和分流化的摄像框架将进一步引入了我们以往的图像模型,从而可以降低GAN 。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日