Video-kMaX: 一种简单的统一方法，用于在线和近在线视频全景分割 (Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation) - 专知论文

会员服务 ·

0

分割 · 全景分割 · 在线 · 关联 · 视频 ·

2023 年 4 月 10 日

Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation

翻译：Video-kMaX: 一种简单的统一方法，用于在线和近在线视频全景分割

Inkyu Shin,Dahun Kim,Qihang Yu,Jun Xie,Hong-Seok Kim,Bradley Green,In So Kweon,Kuk-Jin Yoon,Liang-Chieh Chen

Video Panoptic Segmentation (VPS) aims to achieve comprehensive pixel-level scene understanding by segmenting all pixels and associating objects in a video. Current solutions can be categorized into online and near-online approaches. Evolving over the time, each category has its own specialized designs, making it nontrivial to adapt models between different categories. To alleviate the discrepancy, in this work, we propose a unified approach for online and near-online VPS. The meta architecture of the proposed Video-kMaX consists of two components: within clip segmenter (for clip-level segmentation) and cross-clip associater (for association beyond clips). We propose clip-kMaX (clip k-means mask transformer) and HiLA-MB (Hierarchical Location-Aware Memory Buffer) to instantiate the segmenter and associater, respectively. Our general formulation includes the online scenario as a special case by adopting clip length of one. Without bells and whistles, Video-kMaX sets a new state-of-the-art on KITTI-STEP and VIPSeg for video panoptic segmentation, and VSPW for video semantic segmentation. Code will be made publicly available.

翻译：视频全景分割（VPS）旨在通过对视频中的所有像素进行分割并关联对象，达到全面的像素级场景理解。目前的解决方案可以分为在线和近在线方法两类。随着时间的推移，每个类别都有其专业的设计，使得在不同类别之间适应模型变得不容易。为了缓解差异，本文提出了一种统一的在线和近在线VPS方法。所提出的Video-kMaX的元架构包括两个组件：视频剪辑内分割器（用于剪辑级分割）和跨剪辑关联器（用于超越剪辑的关联）。我们提出clip-kMaX（剪辑k-means掩模变换器）和HiLA-MB（分层位置感知内存缓冲区）来实例化分割器和关联器。我们的一般公式包括在线情况作为其中一种特殊情况，即采用长度为1的剪辑。没有花哨的东西，Video-kMaX为视频全景分割和视频语义分割的KITTI-STEP、VIPSeg和VSPW等方面树立了新的技术水平。代码将公开使用。

0

相关内容

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

专知会员服务

15+阅读 · 2022年3月19日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【CVPR2021】用Transformers无监督预训练进行目标检测

【CVPR2021】用Transformers无监督预训练进行目标检测

专知会员服务

58+阅读 · 2021年3月3日

运动物体检测与运动相机:一个全面的综述：Moving Objects Detection with a Moving Camera: A Comprehensive Review

运动物体检测与运动相机:一个全面的综述：Moving Objects Detection with a Moving Camera: A Comprehensive Review

专知会员服务

27+阅读 · 2020年1月17日

【ICCV2019教程】物体检测的R-CNN通用框架，The Generalized R-CNN Framework for Object Detection，180页ppt，Facebook 人工智能研究院Ross Girshick大神

【ICCV2019教程】物体检测的R-CNN通用框架，The Generalized R-CNN Framework for Object Detection，180页ppt，Facebook 人工智能研究院Ross Girshick大神

专知会员服务

25+阅读 · 2019年11月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

CVPR2019| 04-24更新12篇论文及代码（位姿估计/自动驾驶/GAN/图像生成等）

CVPR2019| 04-24更新12篇论文及代码（位姿估计/自动驾驶/GAN/图像生成等）

极市平台

11+阅读 · 2019年4月24日

CVPR2019| 04-23更新7篇论文及代码（1篇oral，含视频目标分割、物体检测、三维点云等）

CVPR2019| 04-23更新7篇论文及代码（1篇oral，含视频目标分割、物体检测、三维点云等）

极市平台

27+阅读 · 2019年4月23日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

可见光响应型Cu2O/Bi2WO6催化剂的构筑及光催化降解SCFA制取氢气和烷烃的机理

国家自然科学基金

0+阅读 · 2014年12月31日

工件可拒绝的折衷排序和在线排序

国家自然科学基金

0+阅读 · 2014年12月31日

有理映射的参数空间

国家自然科学基金

0+阅读 · 2013年12月31日

融合多尺度上下文的图像标注研究

国家自然科学基金

2+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于单个非标定视频序列的非刚体三维结构重建及运动恢复研究

国家自然科学基金

0+阅读 · 2013年12月31日

轻质层状硼化物复合陶瓷结构设计及双尺度理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

全光场相机的成像理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

在线和离线折衷排序研究

国家自然科学基金

0+阅读 · 2012年12月31日

大型复杂轴类零件车铣复合加工动力学与运动规划

国家自然科学基金

0+阅读 · 2012年12月31日

SSSegmenation: An Open Source Supervised Semantic Segmentation Toolbox Based on PyTorch

Arxiv

0+阅读 · 2023年5月26日

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation

Arxiv

0+阅读 · 2023年5月26日

CVB: A Video Dataset of Cattle Visual Behaviors

Arxiv

0+阅读 · 2023年5月26日

Cross-Shape Attention for Part Segmentation of 3D Point Clouds

Arxiv

0+阅读 · 2023年5月25日

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Arxiv

0+阅读 · 2023年5月25日

Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation

Arxiv

0+阅读 · 2023年5月24日

Prompt Evolution for Generative AI: A Classifier-Guided Approach

Arxiv

0+阅读 · 2023年5月24日

K-Net: Towards Unified Image Segmentation

Arxiv

12+阅读 · 2021年11月1日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

A 3D Coarse-to-Fine Framework for Volumetric Medical Image Segmentation

A 3D Coarse-to-Fine Framework for Volumetric Medical Image Segmentation

Arxiv

15+阅读 · 2018年8月2日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

【CVPR 2022】基于实例深度估计的统一深度感知全景分割 PanopticDepth: Per-Instance Depth Estimation for Unified Depth-Aware Panoptic Segmentation

专知会员服务

18+阅读 · 2022年3月19日

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

【CVPR 2022】MixFormer：跨窗口与维度的特征融合，MixFormer: Mixing Features across Windows and Dimensions

专知会员服务

15+阅读 · 2022年3月19日

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

【CVPR 2022】可控图像合成与编辑的合成生成先验学习，SemanticStyleGAN: Learning Compositonal Generative Priors for Controllable Image Synthesis and Editing

专知会员服务

23+阅读 · 2022年3月3日

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

【CVPR 2022】使用多模态Transformer的端到端视频对象分割，End-to-End Referring Video Object Segmentation with Multimodal Transformer

专知会员服务

28+阅读 · 2022年3月3日

“CVPR 2021 接受论文列表 1663篇论文都在这了

专知会员服务

32+阅读 · 2021年6月12日

【CVPR2021】用Transformers无监督预训练进行目标检测

【CVPR2021】用Transformers无监督预训练进行目标检测

专知会员服务

58+阅读 · 2021年3月3日

运动物体检测与运动相机:一个全面的综述：Moving Objects Detection with a Moving Camera: A Comprehensive Review

运动物体检测与运动相机:一个全面的综述：Moving Objects Detection with a Moving Camera: A Comprehensive Review

专知会员服务

27+阅读 · 2020年1月17日

【ICCV2019教程】物体检测的R-CNN通用框架，The Generalized R-CNN Framework for Object Detection，180页ppt，Facebook 人工智能研究院Ross Girshick大神

【ICCV2019教程】物体检测的R-CNN通用框架，The Generalized R-CNN Framework for Object Detection，180页ppt，Facebook 人工智能研究院Ross Girshick大神

专知会员服务

25+阅读 · 2019年11月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

【伯克利博士论文】通过真实世界实践赋能机器人自主性

军用无人机集群技术尚未成熟——但潜力可期

人工智能安全治理白皮书（2025）

AgentOps综述：分类、挑战与未来方向

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

CVPR2019| 04-24更新12篇论文及代码（位姿估计/自动驾驶/GAN/图像生成等）

CVPR2019| 04-24更新12篇论文及代码（位姿估计/自动驾驶/GAN/图像生成等）

极市平台

11+阅读 · 2019年4月24日

CVPR2019| 04-23更新7篇论文及代码（1篇oral，含视频目标分割、物体检测、三维点云等）

CVPR2019| 04-23更新7篇论文及代码（1篇oral，含视频目标分割、物体检测、三维点云等）

极市平台

27+阅读 · 2019年4月23日

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

Github项目推荐 | 语义分割、实例分割、全景分割和视频分割的论文和基准列表

AI研习社

32+阅读 · 2019年4月5日

TorchSeg：基于pytorch的语义分割算法开源了

TorchSeg：基于pytorch的语义分割算法开源了

极市平台

20+阅读 · 2019年1月28日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

【论文推荐】最新八篇视频描述生成相关论文—在线视频理解、联合定位和描述事件、生成视频、跨模态注意力机制、联合事件检测和描述

专知

11+阅读 · 2018年6月4日

【推荐】用Python/OpenCV实现增强现实

【推荐】用Python/OpenCV实现增强现实

机器学习研究会

15+阅读 · 2017年11月16日

【推荐】YOLO实时目标检测(6fps)

【推荐】YOLO实时目标检测(6fps)

机器学习研究会

20+阅读 · 2017年11月5日

相关论文

SSSegmenation: An Open Source Supervised Semantic Segmentation Toolbox Based on PyTorch

Arxiv

0+阅读 · 2023年5月26日

SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation

Arxiv

0+阅读 · 2023年5月26日

CVB: A Video Dataset of Cattle Visual Behaviors

Arxiv

0+阅读 · 2023年5月26日

Cross-Shape Attention for Part Segmentation of 3D Point Clouds

Arxiv

0+阅读 · 2023年5月25日

Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation

Arxiv

0+阅读 · 2023年5月25日

Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation

Arxiv

0+阅读 · 2023年5月24日

Prompt Evolution for Generative AI: A Classifier-Guided Approach

Arxiv

0+阅读 · 2023年5月24日

K-Net: Towards Unified Image Segmentation

Arxiv

12+阅读 · 2021年11月1日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

A 3D Coarse-to-Fine Framework for Volumetric Medical Image Segmentation

A 3D Coarse-to-Fine Framework for Volumetric Medical Image Segmentation

Arxiv

15+阅读 · 2018年8月2日

相关基金

可见光响应型Cu2O/Bi2WO6催化剂的构筑及光催化降解SCFA制取氢气和烷烃的机理

国家自然科学基金

0+阅读 · 2014年12月31日

工件可拒绝的折衷排序和在线排序

国家自然科学基金

0+阅读 · 2014年12月31日

有理映射的参数空间

国家自然科学基金

0+阅读 · 2013年12月31日

融合多尺度上下文的图像标注研究

国家自然科学基金

2+阅读 · 2013年12月31日

Vlasov-Poisson-Boltzmann方程研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于单个非标定视频序列的非刚体三维结构重建及运动恢复研究

国家自然科学基金

0+阅读 · 2013年12月31日

轻质层状硼化物复合陶瓷结构设计及双尺度理论研究

国家自然科学基金

0+阅读 · 2012年12月31日

全光场相机的成像理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

在线和离线折衷排序研究

国家自然科学基金

0+阅读 · 2012年12月31日

大型复杂轴类零件车铣复合加工动力学与运动规划

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员