归零:零光文本视频编辑</s> (FateZero: Fusing Attentions for Zero-shot Text-based Video Editing) - 专知论文

会员服务 ·

0

Attention · MoDELS · 去噪 · 掩码 · Extensibility ·

2023 年 3 月 16 日

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

翻译：归零:零光文本视频编辑

Chenyang Qi,Xiaodong Cun,Yong Zhang,Chenyang Lei,Xintao Wang,Ying Shan,Qifeng Chen

from arxiv, Project page: https://fate-zero-edit.github.io; Github repository: https://github.com/ChenyangQiQi/FateZero;

The diffusion-based generative models have achieved remarkable success in text-based image generation. However, since it contains enormous randomness in generation progress, it is still challenging to apply such models for real-world visual content editing, especially in videos. In this paper, we propose FateZero, a zero-shot text-based editing method on real-world videos without per-prompt training or use-specific mask. To edit videos consistently, we propose several techniques based on the pre-trained models. Firstly, in contrast to the straightforward DDIM inversion technique, our approach captures intermediate attention maps during inversion, which effectively retain both structural and motion information. These maps are directly fused in the editing process rather than generated during denoising. To further minimize semantic leakage of the source video, we then fuse self-attentions with a blending mask obtained by cross-attention features from the source prompt. Furthermore, we have implemented a reform of the self-attention mechanism in denoising UNet by introducing spatial-temporal attention to ensure frame consistency. Yet succinct, our method is the first one to show the ability of zero-shot text-driven video style and local attribute editing from the trained text-to-image model. We also have a better zero-shot shape-aware editing ability based on the text-to-video model. Extensive experiments demonstrate our superior temporal consistency and editing capability than previous works.

翻译：以传播为基础的基因模型在基于文本的图像生成方面取得了显著的成功。但是,由于它包含在生成过程中的巨大随机性,因此应用这种模型用于真实世界的视觉内容编辑,特别是在视频中,仍然具有挑战性。在本文中,我们提议在不经过一次即时培训或使用特定面罩的情况下对真实世界的视频采用零点文本编辑方法FateZero。为了不断编辑视频,我们根据预先培训的模型提出了几种技术。首先,与直接的DDIM转换技术相比,我们的方法在转换过程中捕捉了中间关注地图,这实际上保留了结构信息和运动信息。这些地图直接与编辑过程结合,而不是在删除过程中生成。为了进一步尽量减少源视频的语义泄漏,我们然后将自我注意与通过源的交叉控制特征获得的混合面罩结合起来。此外,我们采用了一种自留机制的改革,通过引入空间时空调关注来确保框架的一致性。但简洁的是,我们的方法是第一个在编辑过程中直接结合了编辑过程,而不是在解调过程中生成了编辑过程的精度。我们所训练的文本模型展示了从零度的图像编辑能力,我们所动的原版式的图像的图像格式,我们还展示了以展示了一种更精度上更精确的缩缩缩的图像的图像编辑能力。</s>

0

相关内容

Attention

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

专知

10+阅读 · 2018年6月8日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

HGF/c-Met介导COL1A2在年龄相关性黄斑变性发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

肿瘤相关成纤维细胞在非小细胞肺癌EGFR-TKI耐药中作用与机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

NLRP3炎性小体介导同型半胱氨酸诱导动脉粥样硬化炎症反应的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

TLR4活化TAP63a诱导细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Pim-3促进自噬对脓毒血症所致肾小管上皮细胞损伤的保护作用

国家自然科学基金

0+阅读 · 2012年12月31日

新型超声微泡介导靶向Survivin基因siRNA治疗原发性肝细胞癌

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

脂肪组织PTP1B表达对局部RAS的调控作用及机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

新型化合物HP-10抑制LoVo细胞增殖的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

水声通信的频率与功率动态控制机理

国家自然科学基金

0+阅读 · 2009年12月31日

ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation

Arxiv

0+阅读 · 2023年5月8日

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

Arxiv

0+阅读 · 2023年5月8日

UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese

Arxiv

0+阅读 · 2023年5月7日

VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation

Arxiv

0+阅读 · 2023年5月4日

Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation

Arxiv

0+阅读 · 2023年5月4日

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Arxiv

0+阅读 · 2023年5月4日

Expanding Synthetic Real-World Degradations for Blind Video Super Resolution

Arxiv

0+阅读 · 2023年5月4日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

165+阅读 · 2020年3月18日

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

【AAAI2020】知识图谱的生成式对抗零样本关系学习，Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

专知会员服务

64+阅读 · 2020年1月11日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

从社会学实验到行为仿真：理解基于Agent的观点动力学建模思维

中英文版《GPT-5 System Card速览》报告

ACL 2025 | 大模型结构化知识提示的泛化能力研究

【普林斯顿博士论文】大型模型的高效推理

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

Zero-Shot Learning相关资源大列表

Zero-Shot Learning相关资源大列表

专知

52+阅读 · 2019年1月1日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

【论文推荐】最新五篇视觉问答相关论文—深度学习评价、交互注意融合、VizWiz、引导注意力、

专知

10+阅读 · 2018年6月8日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

相关论文

ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation

Arxiv

0+阅读 · 2023年5月8日

Prompt Tuning Inversion for Text-Driven Image Editing Using Diffusion Models

Arxiv

0+阅读 · 2023年5月8日

UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese

Arxiv

0+阅读 · 2023年5月7日

VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation

Arxiv

0+阅读 · 2023年5月4日

Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation

Arxiv

0+阅读 · 2023年5月4日

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Arxiv

0+阅读 · 2023年5月4日

Expanding Synthetic Real-World Degradations for Blind Video Super Resolution

Arxiv

0+阅读 · 2023年5月4日

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Spatio-Temporal Graph for Video Captioning with Knowledge Distillation

Arxiv

19+阅读 · 2020年3月31日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

HGF/c-Met介导COL1A2在年龄相关性黄斑变性发病中的作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

肿瘤相关成纤维细胞在非小细胞肺癌EGFR-TKI耐药中作用与机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

NLRP3炎性小体介导同型半胱氨酸诱导动脉粥样硬化炎症反应的作用及机制

国家自然科学基金

0+阅读 · 2012年12月31日

TLR4活化TAP63a诱导细胞凋亡的分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

Pim-3促进自噬对脓毒血症所致肾小管上皮细胞损伤的保护作用

国家自然科学基金

0+阅读 · 2012年12月31日

新型超声微泡介导靶向Survivin基因siRNA治疗原发性肝细胞癌

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

脂肪组织PTP1B表达对局部RAS的调控作用及机制研究

国家自然科学基金

0+阅读 · 2010年12月31日

新型化合物HP-10抑制LoVo细胞增殖的分子机制研究

国家自然科学基金

0+阅读 · 2009年12月31日

水声通信的频率与功率动态控制机理

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员