移动设备上的视频实例分割：MobileInst (MobileInst: Video Instance Segmentation on the Mobile) - 专知论文

会员服务 ·

0

视频实例分割 · 实例分割 · 分割 · 移动设备 · YouTube ·

2023 年 3 月 30 日

MobileInst: Video Instance Segmentation on the Mobile

翻译：移动设备上的视频实例分割：MobileInst

Renhong Zhang,Tianheng Cheng,Shusheng Yang,Haoyi Jiang,Shuai Zhang,Jiancheng Lyu,Xin Li,Xiaowen Ying,Dashan Gao,Wenyu Liu,Xinggang Wang

from arxiv, Preprint. 12 pages, 7 figures

Although recent approaches aiming for video instance segmentation have achieved promising results, it is still difficult to employ those approaches for real-world applications on mobile devices, which mainly suffer from (1) heavy computation and memory cost and (2) complicated heuristics for tracking objects. To address those issues, we present MobileInst, a lightweight and mobile-friendly framework for video instance segmentation on mobile devices. Firstly, MobileInst adopts a mobile vision transformer to extract multi-level semantic features and presents an efficient query-based dual-transformer instance decoder for mask kernels and a semantic-enhanced mask decoder to generate instance segmentation per frame. Secondly, MobileInst exploits simple yet effective kernel reuse and kernel association to track objects for video instance segmentation. Further, we propose temporal query passing to enhance the tracking ability for kernels. We conduct experiments on COCO and YouTube-VIS datasets to demonstrate the superiority of MobileInst and evaluate the inference latency on a mobile CPU core of Qualcomm Snapdragon-778G, without other methods of acceleration. On the COCO dataset, MobileInst achieves 30.5 mask AP and 176 ms on the mobile CPU, which reduces the latency by 50% compared to the previous SOTA. For video instance segmentation, MobileInst achieves 35.0 AP on YouTube-VIS 2019 and 30.1 AP on YouTube-VIS 2021. Code will be available to facilitate real-world applications and future research.

翻译：虽然最近针对视频实例分割的方法取得了很好的结果，但在移动设备上难以应用于实际应用，主要表现为：（1）计算和内存成本较高，（2）追踪对象的启发式策略比较复杂。为了解决这些问题，我们提出了 MobileInst，这是一个针对移动设备的轻量级和友好型视频实例分割框架。首先，MobileInst采用一个移动视觉变换器来提取多级语义特征，并提出了一种高效的基于查询的双变换器实例解码器来进行掩膜内核和语义增强掩膜解码器的查询，以生成每一帧的实例分割。其次，MobileInst利用简单但有效的核重用和核关联来跟踪对象进行视频实例分割。此外，我们提出了时间查询传递，以增强内核的跟踪能力。我们在COCO和YouTube-VIS数据集上进行了实验，以展示MobileInst的优越性，并评估了在高通骁龙778G移动CPU核上的推理延迟，不用其他的加速方法。在COCO数据集上，MobileInst实现了30.5个掩膜AP和176毫秒的移动CPU核延迟，相比之前的最佳结果减少了50%。针对视频实例分割，MobileInst在YouTube-VIS 2019上实现了35.0的AP，在YouTube-VIS 2021上实现了30.1的AP。代码将提供以便于实际应用和未来研究。

0

相关内容

视频实例分割

视频实例分割

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

【CVPR 2022】基于Tracklet查询和建议的高效视频实例分割，Efficient Video Instance Segmentation via Tracklet Query and Proposal

【CVPR 2022】基于Tracklet查询和建议的高效视频实例分割，Efficient Video Instance Segmentation via Tracklet Query and Proposal

专知会员服务

16+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【视频目标检测与跟踪：综述论文】Video Object Segmentation and Tracking: A Survey

专知会员服务

66+阅读 · 2020年6月4日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

专知会员服务

22+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【泡泡一分钟】三维卷积神经网络实现实时非模态三维目标检测

【泡泡一分钟】三维卷积神经网络实现实时非模态三维目标检测

泡泡机器人SLAM

12+阅读 · 2019年5月20日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

泡泡机器人SLAM

23+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

泡泡机器人SLAM

13+阅读 · 2018年10月7日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

机器学习研究会

11+阅读 · 2017年12月5日

Survivin在低氧诱导喉癌淋巴管生成中的调控作用及其分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

非小细胞肺癌中黄连素干预TF/FVIIa通路抑制转移的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

microRNA在缺血再灌注致急性肾损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向协同的设计重用启发模型

国家自然科学基金

0+阅读 · 2013年12月31日

面向移动IPTV直播频道可用度的优化技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于几何操作转换的异构三维CAD系统协同方法与关键技术

国家自然科学基金

0+阅读 · 2013年12月31日

Fibulin-5/β1-integrin 信号通路在醛固酮诱导血管平滑肌细胞凋亡中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于多性能退化参数的装备关键系统实时可靠性评估与预测方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

面向实时逼真绘制的非传统相机模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

超精度视频内容三维重建

国家自然科学基金

0+阅读 · 2011年12月31日

Enhancing Transformer Backbone for Egocentric Video Action Segmentation

Arxiv

0+阅读 · 2023年5月19日

Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided Relation Alignment and Adaptation

Arxiv

0+阅读 · 2023年5月18日

Multimodal Short Video Rumor Detection System Based on Contrastive Learning

Arxiv

0+阅读 · 2023年5月17日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

3D Backbone Network for 3D Object Detection

Arxiv

12+阅读 · 2019年1月24日

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Arxiv

14+阅读 · 2018年9月6日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

An application of cascaded 3D fully convolutional networks for medical image segmentation

Arxiv

10+阅读 · 2018年3月20日

VIP会员

文章信息

相关主题

视频实例分割

相关VIP内容

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

【Hugging Face】使用自定义数据集微调语义分割模型，Fine-Tune a Semantic Segmentation Model with a Custom Dataset

专知会员服务

21+阅读 · 2022年3月18日

【CVPR 2022】基于Tracklet查询和建议的高效视频实例分割，Efficient Video Instance Segmentation via Tracklet Query and Proposal

【CVPR 2022】基于Tracklet查询和建议的高效视频实例分割，Efficient Video Instance Segmentation via Tracklet Query and Proposal

专知会员服务

16+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

【视频目标检测与跟踪：综述论文】Video Object Segmentation and Tracking: A Survey

专知会员服务

66+阅读 · 2020年6月4日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

【香港中文大学-CVPR2020】Rotate-and-Render: Unsupervised Photorealistic Face Rotation from Single-View Images

专知会员服务

22+阅读 · 2020年3月18日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

现代战争的杀伤区：规模结构、控制手段、生存与战线转移

中文版 | 人工智能时代的任务式指挥

中文版 | 数据投毒：AI驱动战争中优势地位的隐蔽武器

以色列在加沙战争部署新型军事人工智能

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

【泡泡一分钟】三维卷积神经网络实现实时非模态三维目标检测

【泡泡一分钟】三维卷积神经网络实现实时非模态三维目标检测

泡泡机器人SLAM

12+阅读 · 2019年5月20日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

【泡泡一分钟】DS-SLAM: 动态环境下的语义视觉SLAM

泡泡机器人SLAM

23+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

【泡泡点云时空】基于增量分割的3D点云定位方法（ICRA2018-4）

泡泡机器人SLAM

13+阅读 · 2018年10月7日

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

【论文推荐】最新5篇目标跟踪（Object Tracking）相关论文—并行跟踪和验证、光流、自动跟踪、相关滤波集成、CFNet

专知

25+阅读 · 2018年2月6日

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

【推荐】(TensorFlow)SSD实时手部检测与追踪（附代码）

机器学习研究会

11+阅读 · 2017年12月5日

相关论文

Enhancing Transformer Backbone for Egocentric Video Action Segmentation

Arxiv

0+阅读 · 2023年5月19日

Advancing Incremental Few-shot Semantic Segmentation via Semantic-guided Relation Alignment and Adaptation

Arxiv

0+阅读 · 2023年5月18日

Multimodal Short Video Rumor Detection System Based on Contrastive Learning

Arxiv

0+阅读 · 2023年5月17日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Reverse Attention for Salient Object Detection

Arxiv

11+阅读 · 2019年4月15日

3D Backbone Network for 3D Object Detection

Arxiv

12+阅读 · 2019年1月24日

Deep Learning for Generic Object Detection: A Survey

Deep Learning for Generic Object Detection: A Survey

Arxiv

14+阅读 · 2018年9月6日

Mobile Video Object Detection with Temporally-Aware Feature Maps

Arxiv

11+阅读 · 2018年3月28日

An application of cascaded 3D fully convolutional networks for medical image segmentation

Arxiv

10+阅读 · 2018年3月20日

相关基金

Survivin在低氧诱导喉癌淋巴管生成中的调控作用及其分子机制

国家自然科学基金

0+阅读 · 2015年12月31日

非小细胞肺癌中黄连素干预TF/FVIIa通路抑制转移的作用研究

国家自然科学基金

0+阅读 · 2014年12月31日

microRNA在缺血再灌注致急性肾损伤中的作用及机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

面向协同的设计重用启发模型

国家自然科学基金

0+阅读 · 2013年12月31日

面向移动IPTV直播频道可用度的优化技术研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于几何操作转换的异构三维CAD系统协同方法与关键技术

国家自然科学基金

0+阅读 · 2013年12月31日

Fibulin-5/β1-integrin 信号通路在醛固酮诱导血管平滑肌细胞凋亡中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

基于多性能退化参数的装备关键系统实时可靠性评估与预测方法研究

国家自然科学基金

1+阅读 · 2012年12月31日

面向实时逼真绘制的非传统相机模型研究

国家自然科学基金

0+阅读 · 2012年12月31日

超精度视频内容三维重建

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员