Dreamix:视频传播模型是一般视频编辑 (Dreamix: Video Diffusion Models are General Video Editors)

Text-driven image and video diffusion models have recently achieved unprecedented generation realism. While diffusion models have been successfully applied for image editing, very few works have done so for video editing. We present the first diffusion-based method that is able to perform text-based motion and appearance editing of general videos. Our approach uses a video diffusion model to combine, at inference time, the low-resolution spatio-temporal information from the original video with new, high resolution information that it synthesized to align with the guiding text prompt. As obtaining high-fidelity to the original video requires retaining some of its high-resolution information, we add a preliminary stage of finetuning the model on the original video, significantly boosting fidelity. We propose to improve motion editability by a new, mixed objective that jointly finetunes with full temporal attention and with temporal attention masking. We further introduce a new framework for image animation. We first transform the image into a coarse video by simple image processing operations such as replication and perspective geometric projections, and then use our general video editor to animate it. As a further application, we can use our method for subject-driven video generation. Extensive qualitative and numerical experiments showcase the remarkable editing ability of our method and establish its superior performance compared to baseline methods.

翻译：文本驱动图像和视频传播模型最近取得了前所未有的一代现实主义。虽然传播模型成功地应用于图像编辑,但在视频编辑方面却很少成功。我们展示了第一种基于传播的方法,能够进行基于文本的动作和一般视频的外观编辑。我们的方法是使用视频传播模型,在推论时间,将原始视频中的低分辨率spatio-时间信息与新的高分辨率信息结合起来,并合成了新的高分辨率信息,以便与提示文本保持一致。由于对原始视频的高度忠诚要求保留一些高清晰度信息,我们增加了对原始视频模型进行微调的初步阶段,大大提升忠实性。我们提议通过一个新的混合目标改进运动的可编辑性,该新目标以全部时间关注和时间关注遮蔽为共同微调。我们进一步引入了图像动画的新框架。我们首先通过简单图像处理操作将图像转换为粗俗的视频,例如复制和视觉预测,然后使用我们的一般视频编辑器将其保留为反常识度信息。作为进一步应用,我们可以使用我们的主题性实验方法,从而将高超级的图像模拟模型进行。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日