STMixer：一种单阶段稀疏动作检测器 (STMixer: A One-Stage Sparse Action Detector) - 专知论文

会员服务 ·

0

动作检测 · 检测器 · 稀疏 · 解码 · 人体检测 ·

2023 年 3 月 28 日

STMixer: A One-Stage Sparse Action Detector

翻译：STMixer：一种单阶段稀疏动作检测器

Tao Wu,Mengqi Cao,Ziteng Gao,Gangshan Wu,Limin Wang

from arxiv, Accepted by CVPR 2023

Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and cannot capture context information outside the bounding box. Recently, a few query-based action detectors are proposed to predict action instances in an end-to-end manner. However, they still lack adaptability in feature sampling and decoding, thus suffering from the issues of inferior performance or slower convergence. In this paper, we propose a new one-stage sparse action detector, termed STMixer. STMixer is based on two core designs. First, we present a query-based adaptive feature sampling module, which endows our STMixer with the flexibility of mining a set of discriminative features from the entire spatiotemporal domain. Second, we devise a dual-branch feature mixing module, which allows our STMixer to dynamically attend to and mix video features along the spatial and the temporal dimension respectively for better feature decoding. Coupling these two designs with a video backbone yields an efficient end-to-end action detector. Without bells and whistles, our STMixer obtains the state-of-the-art results on the datasets of AVA, UCF101-24, and JHMDB.

翻译：传统的视频动作检测器通常采用两阶段管道，在人体检测器中首先生成角色框，然后使用3D RoIAlign为分类提取角色特定特征。这种检测范例需要多阶段训练和推理，并且无法捕获边界框以外的上下文信息。最近，提出了一些基于查询的动作检测器，以一种端对端的方式预测动作实例。然而，它们仍缺乏特征采样和解码的适应性，因此面临着性能不佳或收敛速度慢等问题。本文提出了一种新的单阶段稀疏动作检测器STMixer。STMixer基于两个核心设计。首先，我们提出了一个基于查询的自适应特征采样模块，为我们的STMixer赋予了从整个时空域中挖掘一组具有区分性特征的灵活性。其次，我们设计了一个双分支特征混合模块，允许我们的STMixer分别沿着空间和时间维度动态关注和混合视频特征，以获得更好的特征解码。将这两个设计与视频主干相耦合，就可以得到一个高效的端到端动作检测器。没有花哨的东西，我们的STMixer在AVA、UCF101-24和JHMDB数据集上获得了最先进的结果。

0

相关内容

动作检测

CVPR 2022 Oral | 南京大学AdaMixer：基于快速收敛查询的目标检测器

CVPR 2022 Oral | 南京大学AdaMixer：基于快速收敛查询的目标检测器

专知会员服务

11+阅读 · 2022年4月10日

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日

【AAAI2022】基于交互式transformer和暹罗网络的视频目标分割

【AAAI2022】基于交互式transformer和暹罗网络的视频目标分割

专知会员服务

24+阅读 · 2022年2月6日

【AAAI2022】锚点DETR：基于transformer检测器的查询设计

【AAAI2022】锚点DETR：基于transformer检测器的查询设计

专知会员服务

13+阅读 · 2021年12月31日

【NeurIPS2021】ResT:一个有效的视觉识别转换器

【NeurIPS2021】ResT:一个有效的视觉识别转换器

专知会员服务

23+阅读 · 2021年10月25日

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

专知会员服务

51+阅读 · 2020年5月28日

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

专知会员服务

18+阅读 · 2019年11月30日

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

专知会员服务

12+阅读 · 2019年11月15日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

首个目标检测扩散模型，比Faster R-CNN、DETR好，从随机框中直接检测

首个目标检测扩散模型，比Faster R-CNN、DETR好，从随机框中直接检测

机器之心

1+阅读 · 2022年11月21日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

视频目标检测：Flow-based

视频目标检测：Flow-based

极市平台

22+阅读 · 2019年5月27日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

CVPR2019 | Stereo R-CNN 3D 目标检测

CVPR2019 | Stereo R-CNN 3D 目标检测

极市平台

27+阅读 · 2019年3月10日

ECCV2018目标检测（object detection）算法总览（部分含代码）

ECCV2018目标检测（object detection）算法总览（部分含代码）

极市平台

30+阅读 · 2018年12月29日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

基于子空间分析的低复杂度联合稀疏恢复方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

滤波器组格型结构自由参数的初始化研究

国家自然科学基金

0+阅读 · 2013年12月31日

部分遮挡下人脸人耳融合识别方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

压缩感知域高光谱图像稀疏解混方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

众核体系架构并行计算模型与算法自适应调优框架研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于子空间变换的机载宽带杂波抑制与扩展目标检测方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于胸腺恢复的AIDS患者T淋巴细胞受体库多样性研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于MAP的低复杂度LDPC译码算法理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

0+阅读 · 2011年12月31日

基于HHT的超光谱图像高精度分类算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

XFormer: Fast and Accurate Monocular 3D Body Capture

Arxiv

0+阅读 · 2023年5月18日

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Arxiv

62+阅读 · 2021年10月25日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Arxiv

17+阅读 · 2020年3月31日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

3D Backbone Network for 3D Object Detection

Arxiv

12+阅读 · 2019年1月24日

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Arxiv

16+阅读 · 2018年1月30日

MSDNN: Multi-Scale Deep Neural Network for Salient Object Detection

Arxiv

21+阅读 · 2018年1月12日

VIP会员

文章信息

相关主题

相关VIP内容

CVPR 2022 Oral | 南京大学AdaMixer：基于快速收敛查询的目标检测器

CVPR 2022 Oral | 南京大学AdaMixer：基于快速收敛查询的目标检测器

专知会员服务

11+阅读 · 2022年4月10日

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

【CVPR 2022】基于时空解耦与重耦的RGB-D动作识别 Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

专知会员服务

14+阅读 · 2022年3月19日

【AAAI2022】基于交互式transformer和暹罗网络的视频目标分割

【AAAI2022】基于交互式transformer和暹罗网络的视频目标分割

专知会员服务

24+阅读 · 2022年2月6日

【AAAI2022】锚点DETR：基于transformer检测器的查询设计

【AAAI2022】锚点DETR：基于transformer检测器的查询设计

专知会员服务

13+阅读 · 2021年12月31日

【NeurIPS2021】ResT:一个有效的视觉识别转换器

【NeurIPS2021】ResT:一个有效的视觉识别转换器

专知会员服务

23+阅读 · 2021年10月25日

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

专知会员服务

51+阅读 · 2020年5月28日

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

【论文|Google】基于元学习的排序架构，Ranking architectures using meta-learning

专知会员服务

18+阅读 · 2019年11月30日

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

【AAAI2020论文-腾讯】通过稠密边界发生器快速学习时间动作方案（Fast Learning of Temporal Action Proposal via Dense Boundary Generator）

专知会员服务

12+阅读 · 2019年11月15日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

操作系统智能体：基于多模态大模型（MLLM）的通用计算设备智能体综述

《美国太空军系统全生命周期建模、仿真与分析效能提升方案》最新84页报告

【博士论文】推进数据高效的深度学习：非参数 Transformer、主动测试与上下文学习

自主人工智能：未来战争是否将是自主化的？

相关资讯

首个目标检测扩散模型，比Faster R-CNN、DETR好，从随机框中直接检测

首个目标检测扩散模型，比Faster R-CNN、DETR好，从随机框中直接检测

机器之心

1+阅读 · 2022年11月21日

Multi-Task Learning的几篇综述文章

Multi-Task Learning的几篇综述文章

深度学习自然语言处理

15+阅读 · 2020年6月15日

局部学习的特征选择：Local-Learning-Based Feature Selection

局部学习的特征选择：Local-Learning-Based Feature Selection

我爱读PAMI

14+阅读 · 2019年9月20日

视频目标检测：Flow-based

视频目标检测：Flow-based

极市平台

22+阅读 · 2019年5月27日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

CVPR2019 | Stereo R-CNN 3D 目标检测

CVPR2019 | Stereo R-CNN 3D 目标检测

极市平台

27+阅读 · 2019年3月10日

ECCV2018目标检测（object detection）算法总览（部分含代码）

ECCV2018目标检测（object detection）算法总览（部分含代码）

极市平台

30+阅读 · 2018年12月29日

Single-Shot Object Detection with Enriched Semantics

Single-Shot Object Detection with Enriched Semantics

统计学习与视觉计算组

14+阅读 · 2018年8月29日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

【论文推荐】最新6篇生成式对抗网络（GAN）相关论文—半监督对抗学习、行人再识别、代表性特征、高分辨率深度卷积、自监督、超分辨

专知

10+阅读 · 2018年2月1日

相关论文

XFormer: Fast and Accurate Monocular 3D Body Capture

Arxiv

0+阅读 · 2023年5月18日

Deep Learning for UAV-based Object Detection and Tracking: A Survey

Arxiv

62+阅读 · 2021年10月25日

Generalized Out-of-Distribution Detection: A Survey

Generalized Out-of-Distribution Detection: A Survey

Arxiv

15+阅读 · 2021年10月21日

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Arxiv

12+阅读 · 2021年5月30日

Image/Video Deep Anomaly Detection: A Survey

Arxiv

16+阅读 · 2021年3月2日

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Few-Shot Object Detection with Attention-RPN and Multi-Relation Detector

Arxiv

17+阅读 · 2020年3月31日

Scene Text Detection and Recognition: The Deep Learning Era

Scene Text Detection and Recognition: The Deep Learning Era

Arxiv

27+阅读 · 2019年9月5日

3D Backbone Network for 3D Object Detection

Arxiv

12+阅读 · 2019年1月24日

Image Captioning at Will: A Versatile Scheme for Effectively Injecting Sentiments into Image Descriptions

Arxiv

16+阅读 · 2018年1月30日

MSDNN: Multi-Scale Deep Neural Network for Salient Object Detection

Arxiv

21+阅读 · 2018年1月12日

相关基金

基于子空间分析的低复杂度联合稀疏恢复方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

滤波器组格型结构自由参数的初始化研究

国家自然科学基金

0+阅读 · 2013年12月31日

部分遮挡下人脸人耳融合识别方法研究

国家自然科学基金

1+阅读 · 2013年12月31日

压缩感知域高光谱图像稀疏解混方法研究

国家自然科学基金

0+阅读 · 2013年12月31日

众核体系架构并行计算模型与算法自适应调优框架研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于子空间变换的机载宽带杂波抑制与扩展目标检测方法

国家自然科学基金

0+阅读 · 2012年12月31日

基于胸腺恢复的AIDS患者T淋巴细胞受体库多样性研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于MAP的低复杂度LDPC译码算法理论和方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

广义Kloosterman和的均值估计

国家自然科学基金

0+阅读 · 2011年12月31日

基于HHT的超光谱图像高精度分类算法研究

国家自然科学基金

0+阅读 · 2009年12月31日

微信扫码咨询专知VIP会员