零样本视频编辑：使用现成的图像扩散模型 (Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models) - 专知论文

会员服务 ·

0

视频 · 零样本 · 扩散模型 · 样本 · 动态性 ·

2023 年 3 月 30 日

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

翻译：零样本视频编辑：使用现成的图像扩散模型

Wen Wang,Kangyang Xie,Zide Liu,Hao Chen,Yue Cao,Xinlong Wang,Chunhua Shen

Large-scale text-to-image diffusion models achieve unprecedented success in image generation and editing. However, how to extend such success to video editing is unclear. Recent initial attempts at video editing require significant text-to-video data and computation resources for training, which is often not accessible. In this work, we propose vid2vid-zero, a simple yet effective method for zero-shot video editing. Our vid2vid-zero leverages off-the-shelf image diffusion models, and doesn't require training on any video. At the core of our method is a null-text inversion module for text-to-video alignment, a cross-frame modeling module for temporal consistency, and a spatial regularization module for fidelity to the original video. Without any training, we leverage the dynamic nature of the attention mechanism to enable bi-directional temporal modeling at test time. Experiments and analyses show promising results in editing attributes, subjects, places, etc., in real-world videos. Code will be made available at \url{https://github.com/baaivision/vid2vid-zero}.

翻译：大规模的文本到图像扩散模型在图像生成和编辑方面取得了前所未有的成功。然而，如何将这样的成功延伸到视频编辑领域尚不清楚。最近的初步尝试需要大量的文本到视频数据和计算资源进行训练，这在很多情况下都是不可访问的。在本文中，我们提出了 vid2vid-zero，一种简单而有效的零样本视频编辑方法。我们的 vid2vid-zero 利用现成的图像扩散模型，不需要对任何视频进行训练。我们方法的核心是一个零文本反演模块，用于文本到视频的对齐；一个跨帧建模模块，用于实现时间上的一致性；以及一个空间正则化模块，用于保持对原始视频的保真度。在没有任何训练的情况下，我们利用注意力机制的动态性质，实现了双向的时间建模。实验证明，我们的方法在编辑属性、主题和场景等方面在现实视频中取得了有希望的结果。代码将在 \url{https://github.com/baaivision/vid2vid-zero} 上公开。

0

相关内容

视频

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【南洋理工大学Chuanxia Zheng博士论文】基于深度生成学习的逼真图像合成，197页pdf，Synthesizing Photorealistic Images with Deep Generative Learning

【南洋理工大学Chuanxia Zheng博士论文】基于深度生成学习的逼真图像合成，197页pdf，Synthesizing Photorealistic Images with Deep Generative Learning

专知会员服务

20+阅读 · 2022年3月9日

【牛津大学博士论文】使用多模态深度学习的视频理解

专知会员服务

67+阅读 · 2021年10月15日

【ICCV2021】一张草图训练可控的GAN？CMU朱俊彦团队

专知会员服务

22+阅读 · 2021年8月10日

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

专知会员服务

22+阅读 · 2020年3月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

专知会员服务

12+阅读 · 2020年3月13日

微软亚洲研究院新论文-《多模态预训练语言模型UniViLM》面向多模态理解和生成的统一视频和语言预训练模型

微软亚洲研究院新论文-《多模态预训练语言模型UniViLM》面向多模态理解和生成的统一视频和语言预训练模型

专知会员服务

109+阅读 · 2020年2月19日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

文件更小，质量更高，大火的Stable Diffusion还能压缩图像？

文件更小，质量更高，大火的Stable Diffusion还能压缩图像？

机器之心

4+阅读 · 2022年10月20日

你输文字，它生成视频：这款新模型让LeCun也开始转梗图了

你输文字，它生成视频：这款新模型让LeCun也开始转梗图了

机器之心

2+阅读 · 2022年9月30日

1句话生成视频AI爆火！Meta最新SOTA模型让网友大受震撼

1句话生成视频AI爆火！Meta最新SOTA模型让网友大受震撼

新智元

2+阅读 · 2022年9月30日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

基于微分方程模型的介质成像和图像处理的数值方法

国家自然科学基金

0+阅读 · 2013年12月31日

光谱成像超分辨率光学编码原理与实现方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

HEVC标准框架下面向复合内容的屏幕视频编码

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

超精度视频内容三维重建

国家自然科学基金

0+阅读 · 2011年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

视频选择性注意机理与语义特征提取

国家自然科学基金

1+阅读 · 2009年12月31日

激活成纤维细胞改善移植胰岛的再血管化

国家自然科学基金

0+阅读 · 2009年12月31日

图像局部纹理的稳定场模型及算法研究

国家自然科学基金

0+阅读 · 2008年12月31日

几何动力学在非完整系统几何数值积分中的应用研究

国家自然科学基金

0+阅读 · 2008年12月31日

SurgMAE: Masked Autoencoders for Long Surgical Video Analysis

Arxiv

0+阅读 · 2023年5月19日

Generating coherent comic with rich story using ChatGPT and Stable Diffusion

Arxiv

0+阅读 · 2023年5月19日

Discriminative Diffusion Models as Few-shot Vision and Language Learners

Arxiv

1+阅读 · 2023年5月18日

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

Arxiv

0+阅读 · 2023年5月17日

WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models

Arxiv

0+阅读 · 2023年5月17日

Pyramid Diffusion Models For Low-light Image Enhancement

Arxiv

0+阅读 · 2023年5月17日

AMD: Autoregressive Motion Diffusion

Arxiv

0+阅读 · 2023年5月17日

A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot

Arxiv

0+阅读 · 2023年5月16日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

VIP会员

文章信息

相关主题

相关VIP内容

自然语言处理顶会NAACL2022最佳论文出炉！

自然语言处理顶会NAACL2022最佳论文出炉！

专知会员服务

43+阅读 · 2022年6月30日

【南洋理工大学Chuanxia Zheng博士论文】基于深度生成学习的逼真图像合成，197页pdf，Synthesizing Photorealistic Images with Deep Generative Learning

【南洋理工大学Chuanxia Zheng博士论文】基于深度生成学习的逼真图像合成，197页pdf，Synthesizing Photorealistic Images with Deep Generative Learning

专知会员服务

20+阅读 · 2022年3月9日

【牛津大学博士论文】使用多模态深度学习的视频理解

专知会员服务

67+阅读 · 2021年10月15日

【ICCV2021】一张草图训练可控的GAN？CMU朱俊彦团队

专知会员服务

22+阅读 · 2021年8月10日

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

专知会员服务

22+阅读 · 2020年3月18日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

【DeepMind-牛津-CMU-CVPR2020】无监督词映射视觉基准，Visual Grounding in Video

专知会员服务

12+阅读 · 2020年3月13日

微软亚洲研究院新论文-《多模态预训练语言模型UniViLM》面向多模态理解和生成的统一视频和语言预训练模型

微软亚洲研究院新论文-《多模态预训练语言模型UniViLM》面向多模态理解和生成的统一视频和语言预训练模型

专知会员服务

109+阅读 · 2020年2月19日

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

近期必读的6篇 NeurIPS 2019 的零样本学习(Zero-Shot Learning)论文

专知会员服务

60+阅读 · 2019年12月24日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

热门VIP内容

开通专知VIP会员享更多权益服务

【博士论文】低维与高维空间中潜在表征的分析、建模与变换

《生态建模密码破译：建模与编程实践》美陆军最新报告

大模型解决方案白皮书：社交陪伴场景全流程落地指南

面向具身操作的视觉-语言-动作模型综述

相关资讯

文件更小，质量更高，大火的Stable Diffusion还能压缩图像？

文件更小，质量更高，大火的Stable Diffusion还能压缩图像？

机器之心

4+阅读 · 2022年10月20日

你输文字，它生成视频：这款新模型让LeCun也开始转梗图了

你输文字，它生成视频：这款新模型让LeCun也开始转梗图了

机器之心

2+阅读 · 2022年9月30日

1句话生成视频AI爆火！Meta最新SOTA模型让网友大受震撼

1句话生成视频AI爆火！Meta最新SOTA模型让网友大受震撼

新智元

2+阅读 · 2022年9月30日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

【论文推荐】最新八篇生成对抗网络相关论文—条件翻译、RGB-D动作识别、量子生成对抗网络、语义对齐、视频摘要、视觉-文本注意力

专知

15+阅读 · 2018年5月15日

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

最新5篇生成对抗网络相关论文推荐—FusedGAN、DeblurGAN、AdvGAN、CipherGAN、MMD GANS

专知

23+阅读 · 2018年1月18日

Generative Adversarial Text to Image Synthesis论文解读

Generative Adversarial Text to Image Synthesis论文解读

统计学习与视觉计算组

13+阅读 · 2017年6月9日

相关论文

SurgMAE: Masked Autoencoders for Long Surgical Video Analysis

Arxiv

0+阅读 · 2023年5月19日

Generating coherent comic with rich story using ChatGPT and Stable Diffusion

Arxiv

0+阅读 · 2023年5月19日

Discriminative Diffusion Models as Few-shot Vision and Language Learners

Arxiv

1+阅读 · 2023年5月18日

Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models

Arxiv

0+阅读 · 2023年5月17日

WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models

Arxiv

0+阅读 · 2023年5月17日

Pyramid Diffusion Models For Low-light Image Enhancement

Arxiv

0+阅读 · 2023年5月17日

AMD: Autoregressive Motion Diffusion

Arxiv

0+阅读 · 2023年5月17日

A Video Is Worth 4096 Tokens: Verbalize Story Videos To Understand Them In Zero Shot

Arxiv

0+阅读 · 2023年5月16日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

MetAug: Contrastive Learning via Meta Feature Augmentation

Arxiv

10+阅读 · 2022年3月10日

相关基金

基于微分方程模型的介质成像和图像处理的数值方法

国家自然科学基金

0+阅读 · 2013年12月31日

光谱成像超分辨率光学编码原理与实现方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

HEVC标准框架下面向复合内容的屏幕视频编码

国家自然科学基金

0+阅读 · 2012年12月31日

函数域中的Vinogradov中值定理

国家自然科学基金

0+阅读 · 2012年12月31日

超精度视频内容三维重建

国家自然科学基金

0+阅读 · 2011年12月31日

基于语义的图像合成

国家自然科学基金

0+阅读 · 2011年12月31日

视频选择性注意机理与语义特征提取

国家自然科学基金

1+阅读 · 2009年12月31日

激活成纤维细胞改善移植胰岛的再血管化

国家自然科学基金

0+阅读 · 2009年12月31日

图像局部纹理的稳定场模型及算法研究

国家自然科学基金

0+阅读 · 2008年12月31日

几何动力学在非完整系统几何数值积分中的应用研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员