TransVOS: 带有变形器的视频对象分割 (TransVOS: Video Object Segmentation with Transformers) - 专知论文

会员服务 ·

0

state-of-the-art · Extensibility · 变换 · MoDELS · SLIM ·

2021 年 9 月 18 日

TransVOS: Video Object Segmentation with Transformers

翻译：TransVOS: 带有变形器的视频对象分割

Jianbiao Mei,Mengmeng Wang,Yeneng Lin,Yi Yuan,Yong Liu

from arxiv, 9 pages, 2 figures

Recently, Space-Time Memory Network (STM) based methods have achieved state-of-the-art performance in semi-supervised video object segmentation (VOS). A crucial problem in this task is how to model the dependency both among different frames and inside every frame. However, most of these methods neglect the spatial relationships (inside each frame) and do not make full use of the temporal relationships (among different frames). In this paper, we propose a new transformer-based framework, termed TransVOS, introducing a vision transformer to fully exploit and model both the temporal and spatial relationships. Moreover, most STM-based approaches employ two separate encoders to extract features of two significant inputs, i.e., reference sets (history frames with predicted masks) and query frame (current frame), respectively, increasing the models' parameters and complexity. To slim the popular two-encoder pipeline while keeping the effectiveness, we design a single two-path feature extractor to encode the above two inputs in a unified way. Extensive experiments demonstrate the superiority of our TransVOS over state-of-the-art methods on both DAVIS and YouTube-VOS datasets.

翻译：最近,基于空间-时记忆网络(STM)的方法在半监控视频对象分割(VOS)中达到了最新水平的性能。这项任务中的一个关键问题是如何在不同的框架和每个框架内建模依赖性。然而,大多数这些方法忽视了空间关系(每个框架),没有充分利用时间关系(在不同框架中 ) 。在本文件中,我们提议一个新的基于变压器的框架,称为 TransVOS,引入一个视野变异器,以充分利用和模拟时间和空间关系。此外,大多数基于STM的方法使用两个独立的编码器来提取两个重要投入的特征,即参考集(带有预测的面具的历史框架)和查询框架(当前框架),分别增加模型参数和复杂性。要缩小流行的双电码管道,同时保持有效性,我们设计一个单一的双向特征提取器,以统一的方式将以上两种投入编码。广泛的实验显示我们的TransVOS优先于DVIS和YouTube-VOS数据设置的状态方法。

0

相关内容

state-of-the-art

state-of-the-art

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

【ICCV 2021】OadTR框架：基于Transformers的在线行为检测任务

专知会员服务

10+阅读 · 2021年9月11日

【ICML2021】生成式视频转换器Transformers: 物体可以是文字吗?

专知会员服务

13+阅读 · 2021年8月20日

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

专知会员服务

22+阅读 · 2021年4月20日

【CVPR2021】Transformer遇见跟踪器：利用时间上下文进行视觉追踪

【CVPR2021】Transformer遇见跟踪器：利用时间上下文进行视觉追踪

专知会员服务

17+阅读 · 2021年3月24日

【视频目标检测与跟踪：综述论文】Video Object Segmentation and Tracking: A Survey

专知会员服务

66+阅读 · 2020年6月4日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

53+阅读 · 2019年4月12日

【泡泡一分钟】LIMO：激光和单目相机融合的视觉里程计

【泡泡一分钟】LIMO：激光和单目相机融合的视觉里程计

泡泡机器人SLAM

12+阅读 · 2019年1月16日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】3D物体的特征编码变种

【泡泡一分钟】3D物体的特征编码变种

泡泡机器人SLAM

4+阅读 · 2019年1月1日

【泡泡一分钟】基于视频修复的时空转换网络

【泡泡一分钟】基于视频修复的时空转换网络

泡泡机器人SLAM

5+阅读 · 2018年12月30日

【泡泡一分钟】SegFlow：视频目标分割和光流的联合学习(ICCV2017-67)

【泡泡一分钟】SegFlow：视频目标分割和光流的联合学习(ICCV2017-67)

泡泡机器人SLAM

9+阅读 · 2018年8月15日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

Faster R-CNN

数据挖掘入门与实战

4+阅读 · 2018年4月20日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

Reliable Propagation-Correction Modulation for Video Object Segmentation

Arxiv

0+阅读 · 2021年12月6日

Dense Unsupervised Learning for Video Segmentation

Arxiv

7+阅读 · 2021年11月11日

Joint Inductive and Transductive Learning for Video Object Segmentation

Arxiv

5+阅读 · 2021年8月8日

Learning Position and Target Consistency for Memory-based Video Object Segmentation

Arxiv

3+阅读 · 2021年4月9日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Arxiv

7+阅读 · 2021年3月22日

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Arxiv

3+阅读 · 2020年12月10日

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Arxiv

8+阅读 · 2020年12月7日

Dual Temporal Memory Network for Efficient Video Object Segmentation

Dual Temporal Memory Network for Efficient Video Object Segmentation

Arxiv

5+阅读 · 2020年3月13日

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Arxiv

4+阅读 · 2019年7月4日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

【ICCV2021】基于Transformer 的神经绘画

专知会员服务

23+阅读 · 2021年9月20日

【ICCV 2021】OadTR框架：基于Transformers的在线行为检测任务

专知会员服务

10+阅读 · 2021年9月11日

【ICML2021】生成式视频转换器Transformers: 物体可以是文字吗?

专知会员服务

13+阅读 · 2021年8月20日

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

专知会员服务

22+阅读 · 2021年4月20日

【CVPR2021】Transformer遇见跟踪器：利用时间上下文进行视觉追踪

【CVPR2021】Transformer遇见跟踪器：利用时间上下文进行视觉追踪

专知会员服务

17+阅读 · 2021年3月24日

【视频目标检测与跟踪：综述论文】Video Object Segmentation and Tracking: A Survey

专知会员服务

66+阅读 · 2020年6月4日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

【CVPR2020】视觉跟踪的概率回归，Probabilistic Regression for Visual Tracking

专知会员服务

37+阅读 · 2020年3月27日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

TensorFlow 2.0官方Transformer教程 (Attention is All you Need)

专知

53+阅读 · 2019年4月12日

【泡泡一分钟】LIMO：激光和单目相机融合的视觉里程计

【泡泡一分钟】LIMO：激光和单目相机融合的视觉里程计

泡泡机器人SLAM

12+阅读 · 2019年1月16日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】3D物体的特征编码变种

【泡泡一分钟】3D物体的特征编码变种

泡泡机器人SLAM

4+阅读 · 2019年1月1日

【泡泡一分钟】基于视频修复的时空转换网络

【泡泡一分钟】基于视频修复的时空转换网络

泡泡机器人SLAM

5+阅读 · 2018年12月30日

【泡泡一分钟】SegFlow：视频目标分割和光流的联合学习(ICCV2017-67)

【泡泡一分钟】SegFlow：视频目标分割和光流的联合学习(ICCV2017-67)

泡泡机器人SLAM

9+阅读 · 2018年8月15日

视觉机械臂 visual-pushing-grasping

视觉机械臂 visual-pushing-grasping

CreateAMind

3+阅读 · 2018年5月25日

Faster R-CNN

数据挖掘入门与实战

4+阅读 · 2018年4月20日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Reliable Propagation-Correction Modulation for Video Object Segmentation

Arxiv

0+阅读 · 2021年12月6日

Dense Unsupervised Learning for Video Segmentation

Arxiv

7+阅读 · 2021年11月11日

Joint Inductive and Transductive Learning for Video Object Segmentation

Arxiv

5+阅读 · 2021年8月8日

Learning Position and Target Consistency for Memory-based Video Object Segmentation

Arxiv

3+阅读 · 2021年4月9日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Arxiv

7+阅读 · 2021年3月22日

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Arxiv

3+阅读 · 2020年12月10日

CompFeat: Comprehensive Feature Aggregation for Video Instance Segmentation

Arxiv

8+阅读 · 2020年12月7日

Dual Temporal Memory Network for Efficient Video Object Segmentation

Dual Temporal Memory Network for Efficient Video Object Segmentation

Arxiv

5+阅读 · 2020年3月13日

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Proposal, Tracking and Segmentation (PTS): A Cascaded Network for Video Object Segmentation

Arxiv

4+阅读 · 2019年7月4日

微信扫码咨询专知VIP会员