图像录象:高定义的视频制作,带有传播模型 (Imagen Video: High Definition Video Generation with Diffusion Models)

Jonathan Ho,William Chan,Chitwan Saharia,Jay Whang,Ruiqi Gao,Alexey Gritsenko,Diederik P. Kingma,Ben Poole,Mohammad Norouzi,David J. Fleet,Tim Salimans

from arxiv, See accompanying website: https://imagen.research.google/video/

We present Imagen Video, a text-conditional video generation system based on a cascade of video diffusion models. Given a text prompt, Imagen Video generates high definition videos using a base video generation model and a sequence of interleaved spatial and temporal video super-resolution models. We describe how we scale up the system as a high definition text-to-video model including design decisions such as the choice of fully-convolutional temporal and spatial super-resolution models at certain resolutions, and the choice of the v-parameterization of diffusion models. In addition, we confirm and transfer findings from previous work on diffusion-based image generation to the video generation setting. Finally, we apply progressive distillation to our video models with classifier-free guidance for fast, high quality sampling. We find Imagen Video not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding. See https://imagen.research.google/video/ for samples.

翻译：我们展示了图像视频,这是一个基于一系列视频传播模型的文本条件视频生成系统。根据一个文本提示,图像视频使用一个基础视频生成模型和一系列空间和时间间视频超分辨率模型生成高定义视频。我们描述我们如何扩大该系统,将其作为高定义文本到视频模型,包括在某些分辨率上选择完全进化时间和空间超分辨率模型的设计决定,以及选择扩散模型的副参数。此外,我们还确认和将以往关于基于传播的图像生成工作的调查结果传输到视频生成设置。最后,我们用快速、高质量的分类式无指导对视频模型进行渐进蒸馏。我们发现图像视频不仅能够生成高度忠诚的视频,而且具有高度控制性和世界知识,包括以各种艺术风格和3D对象理解生成多种视频和文本动画的能力。见 https://imagen.research.google/vicion/ for sample。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

不可错过! CMU CMU《高级自然语言处理》结课了，附课件与视频

专知会员服务

73+阅读 · 2021年10月4日

【CVPR2020】通过自适应GANs生成不同的图像，Diverse Image Generation via Self-Conditioned GANs

专知会员服务

34+阅读 · 2020年6月19日