视频油漆的空间- 时间变换器 (Decoupled Spatial-Temporal Transformer for Video Inpainting) - 专知论文

会员服务 ·

0

图像修复 · Better · 变换 · 块 · 学成 ·

2021 年 4 月 14 日

Decoupled Spatial-Temporal Transformer for Video Inpainting

翻译：视频油漆的空间- 时间变换器

Rui Liu,Hanming Deng,Yangyi Huang,Xiaoyu Shi,Lewei Lu,Wenxiu Sun,Xiaogang Wang,Jifeng Dai,Hongsheng Li

Video inpainting aims to fill the given spatiotemporal holes with realistic appearance but is still a challenging task even with prosperous deep learning approaches. Recent works introduce the promising Transformer architecture into deep video inpainting and achieve better performance. However, it still suffers from synthesizing blurry texture as well as huge computational cost. Towards this end, we propose a novel Decoupled Spatial-Temporal Transformer (DSTT) for improving video inpainting with exceptional efficiency. Our proposed DSTT disentangles the task of learning spatial-temporal attention into 2 sub-tasks: one is for attending temporal object movements on different frames at same spatial locations, which is achieved by temporally-decoupled Transformer block, and the other is for attending similar background textures on same frame of all spatial positions, which is achieved by spatially-decoupled Transformer block. The interweaving stack of such two blocks makes our proposed model attend background textures and moving objects more precisely, and thus the attended plausible and temporally-coherent appearance can be propagated to fill the holes. In addition, a hierarchical encoder is adopted before the stack of Transformer blocks, for learning robust and hierarchical features that maintain multi-level local spatial structure, resulting in the more representative token vectors. Seamless combination of these two novel designs forms a better spatial-temporal attention scheme and our proposed model achieves better performance than state-of-the-art video inpainting approaches with significant boosted efficiency.

翻译：视频绘画的目的是以现实的外观填补特定时空洞,但即使采用繁荣的深层学习方法,它仍然是一个具有挑战性的任务。最近的工作将充满希望的变异器结构引入深层视频涂色和取得更好的性能。然而,它仍然受到模糊的纹理合成的影响以及巨大的计算成本的困扰。为此,我们提议了一个新的脱co的时空变异器(DSTT),用于以超乎寻常的效率改进视频涂色。我们提议的DSTT将学习时空注意力的任务分为两个子任务:一个是在同一空间位置上观看不同框架的时态变异器结构,通过时间分层变异的变异体块(DSTTT),在不同的空间变异时空变异体结构上,我们提议的变异体变异体变体变形模型(DSTTTTTTT), 使得我们提议的变形模型更准确地使用背景纹理和移动对象的方法,因此,在不同的空间变异体结构结构中,一个更好的等级结构结构将更牢固的变形结构,从而将更稳定地在结构结构结构结构中,将更稳定地保持。

0

相关内容

图像修复

图像修复（英语：Inpainting）指重建的图像和视频中丢失或损坏的部分的过程。例如在博物馆中，这项工作常由经验丰富的博物馆管理员或者艺术品修复师来进行。数码世界中，图像修复又称图像插值或视频插值，指利用复杂的算法来替换已丢失、损坏的图像数据，主要替换一些小区域和瑕疵。

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

专知会员服务

44+阅读 · 2021年3月15日

【AAAI2021】时空融合图神经网络的交通流预测

专知会员服务

110+阅读 · 2020年12月22日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日

【CVPR2020-香港中文大学】PointGroup:用于3D实例分割的双设置点分组，PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

【CVPR2020-香港中文大学】PointGroup:用于3D实例分割的双设置点分组，PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

专知会员服务

12+阅读 · 2020年4月6日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

泡泡机器人SLAM

10+阅读 · 2019年4月26日

CVPR2018 | Decoupled Networks

CVPR2018 | Decoupled Networks

极市平台

4+阅读 · 2019年3月22日

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

泡泡机器人SLAM

13+阅读 · 2019年1月3日

【泡泡一分钟】基于视频修复的时空转换网络

【泡泡一分钟】基于视频修复的时空转换网络

泡泡机器人SLAM

5+阅读 · 2018年12月30日

【泡泡点云时空】PU-Net：点云上采样网络（CVPR2018-6）

【泡泡点云时空】PU-Net：点云上采样网络（CVPR2018-6）

泡泡机器人SLAM

6+阅读 · 2018年8月16日

【泡泡一分钟】基于姿态不变的特征嵌入及时空正则化的车辆重识别(ICCV2017-38)

【泡泡一分钟】基于姿态不变的特征嵌入及时空正则化的车辆重识别(ICCV2017-38)

泡泡机器人SLAM

4+阅读 · 2018年6月19日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

Associating Objects with Transformers for Video Object Segmentation

Arxiv

0+阅读 · 2021年6月4日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Multimodal Semantic Attention Network for Video Captioning

Arxiv

4+阅读 · 2019年5月8日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Video Object Detection with an Aligned Spatial-Temporal Memory

Video Object Detection with an Aligned Spatial-Temporal Memory

Arxiv

4+阅读 · 2018年7月27日

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

Arxiv

4+阅读 · 2018年7月8日

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月3日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Video Person Re-identification by Temporal Residual Learning

Arxiv

5+阅读 · 2018年2月22日

Spatial-Temporal Memory Networks for Video Object Detection

Arxiv

4+阅读 · 2017年12月18日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

【CVPR2021】基于Transformers 从序列到序列的角度重新思考语义分割

专知会员服务

44+阅读 · 2021年3月15日

【AAAI2021】时空融合图神经网络的交通流预测

专知会员服务

110+阅读 · 2020年12月22日

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

【CVPR2020】语义增强的场景文本识别的编码-解码器框架，SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition

专知会员服务

25+阅读 · 2020年5月22日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日

【CVPR2020-香港中文大学】PointGroup:用于3D实例分割的双设置点分组，PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

【CVPR2020-香港中文大学】PointGroup:用于3D实例分割的双设置点分组，PointGroup: Dual-Set Point Grouping for 3D Instance Segmentation

专知会员服务

12+阅读 · 2020年4月6日

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

【华盛顿大学】用于视觉和语言导航的多视图学习，Multi-View Learning for Vision-and-Language Navigation

专知会员服务

31+阅读 · 2020年3月11日

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

基于动态时空图CNNs的交通流预测，Dynamic Spatio-temporal Graph-based CNNs for Traffic Flow Prediction

专知会员服务

136+阅读 · 2020年3月8日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Stabilizing Transformers for Reinforcement Learning

Stabilizing Transformers for Reinforcement Learning

专知会员服务

60+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

扩散语言模型综述

《美陆军徒步机动作战条令手册》最新168页

【博士论文】理解神经网络的训练动态：从局部优化轨迹与特征学习视角

军事后勤数字化未来展望

相关资讯

【泡泡汇总】CVPR2019 SLAM Paperlist

【泡泡汇总】CVPR2019 SLAM Paperlist

泡泡机器人SLAM

14+阅读 · 2019年6月12日

BERT/Transformer/迁移学习NLP资源大列表

BERT/Transformer/迁移学习NLP资源大列表

专知

19+阅读 · 2019年6月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

【泡泡一分钟】优化对比度增强以提高SLAM重定位环境中视觉跟踪的稳健性

泡泡机器人SLAM

10+阅读 · 2019年4月26日

CVPR2018 | Decoupled Networks

CVPR2018 | Decoupled Networks

极市平台

4+阅读 · 2019年3月22日

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

【泡泡一分钟】基于机器人的视觉惯性里程计（IROS2018-10）

泡泡机器人SLAM

13+阅读 · 2019年1月3日

【泡泡一分钟】基于视频修复的时空转换网络

【泡泡一分钟】基于视频修复的时空转换网络

泡泡机器人SLAM

5+阅读 · 2018年12月30日

【泡泡点云时空】PU-Net：点云上采样网络（CVPR2018-6）

【泡泡点云时空】PU-Net：点云上采样网络（CVPR2018-6）

泡泡机器人SLAM

6+阅读 · 2018年8月16日

【泡泡一分钟】基于姿态不变的特征嵌入及时空正则化的车辆重识别(ICCV2017-38)

【泡泡一分钟】基于姿态不变的特征嵌入及时空正则化的车辆重识别(ICCV2017-38)

泡泡机器人SLAM

4+阅读 · 2018年6月19日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

相关论文

Associating Objects with Transformers for Video Object Segmentation

Arxiv

0+阅读 · 2021年6月4日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Multimodal Semantic Attention Network for Video Captioning

Arxiv

4+阅读 · 2019年5月8日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Video Object Detection with an Aligned Spatial-Temporal Memory

Video Object Detection with an Aligned Spatial-Temporal Memory

Arxiv

4+阅读 · 2018年7月27日

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

Video Captioning with Boundary-aware Hierarchical Language Decoding and Joint Video Prediction

Arxiv

4+阅读 · 2018年7月8日

Bidirectional Attentive Fusion with Context Gating for Dense Video Captioning

Arxiv

5+阅读 · 2018年4月3日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

Video Person Re-identification by Temporal Residual Learning

Arxiv

5+阅读 · 2018年2月22日

Spatial-Temporal Memory Networks for Video Object Detection

Arxiv

4+阅读 · 2017年12月18日

微信扫码咨询专知VIP会员