以微缩提案探索高品质时间行动 (Towards High-Quality Temporal Action Detection with Sparse Proposals) - 专知论文

会员服务 ·

0

INTERACT · Performer · 稀疏 · Extensibility · 可辨认的 ·

2021 年 9 月 18 日

Towards High-Quality Temporal Action Detection with Sparse Proposals

翻译：以微缩提案探索高品质时间行动

Jiannan Wu,Peize Sun,Shoufa Chen,Jiewen Yang,Zihao Qi,Lan Ma,Ping Luo

from arxiv, 10 pages, 5 figures

Temporal Action Detection (TAD) is an essential and challenging topic in video understanding, aiming to localize the temporal segments containing human action instances and predict the action categories. The previous works greatly rely upon dense candidates either by designing varying anchors or enumerating all the combinations of boundaries on video sequences; therefore, they are related to complicated pipelines and sensitive hand-crafted designs. Recently, with the resurgence of Transformer, query-based methods have tended to become the rising solutions for their simplicity and flexibility. However, there still exists a performance gap between query-based methods and well-established methods. In this paper, we identify the main challenge lies in the large variants of action duration and the ambiguous boundaries for short action instances; nevertheless, quadratic-computational global attention prevents query-based methods to build multi-scale feature maps. Towards high-quality temporal action detection, we introduce Sparse Proposals to interact with the hierarchical features. In our method, named SP-TAD, each proposal attends to a local segment feature in the temporal feature pyramid. The local interaction enables utilization of high-resolution features to preserve action instances details. Extensive experiments demonstrate the effectiveness of our method, especially under high tIoU thresholds. E.g., we achieve the state-of-the-art performance on THUMOS14 (45.7% on mAP@0.6, 33.4% on mAP@0.7 and 53.5% on mAP@Avg) and competitive results on ActivityNet-1.3 (32.99% on mAP@Avg). Code will be made available at https://github.com/wjn922/SP-TAD.

翻译：时间行动探测(TAD)是视频理解中一个重要而具有挑战性的主题,目的是将包含人类行动实例的时间段本地化,并预测行动类别。之前的工作在很大程度上依赖密集候选人,或者设计不同的锚或列举视频序列上的所有边界组合;因此,它们与复杂的管道和敏感的手工制作设计有关。最近,随着变压器的恢复,基于查询的方法往往成为其简单性和灵活性的不断上升的解决方案。然而,在基于查询的方法和既定方法之间仍然存在一种绩效差距。在本文中,我们确定的主要挑战在于行动期限的大型变异和短期行动实例的模糊界限;然而,四角转换式全球关注阻止了基于查询的方法来建立多级地貌地图。在高品质的时间探测中,我们引入了微缩建议,在我们的方法中,称为SP-TAD,每一项建议都包含时间特征金字塔中的局部部分特征。本地互动使得高分辨率特征能够用于维护行动实例;但是,在53°A+A上,我们做了高比例的实验。

0

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

专知会员服务

51+阅读 · 2020年3月17日

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

专知会员服务

26+阅读 · 2020年3月16日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

专知会员服务

27+阅读 · 2019年11月24日

【ICCV2019教程】物体检测的R-CNN通用框架，The Generalized R-CNN Framework for Object Detection，180页ppt，Facebook 人工智能研究院Ross Girshick大神

【ICCV2019教程】物体检测的R-CNN通用框架，The Generalized R-CNN Framework for Object Detection，180页ppt，Facebook 人工智能研究院Ross Girshick大神

专知会员服务

25+阅读 · 2019年11月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

已删除

将门创投

5+阅读 · 2019年4月4日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection

Arxiv

0+阅读 · 2021年12月7日

DCAN: Improving Temporal Action Detection via Dual Context Aggregation

Arxiv

0+阅读 · 2021年12月7日

TubeR: Tubelet Transformer for Video Action Detection

Arxiv

0+阅读 · 2021年12月6日

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Arxiv

5+阅读 · 2021年3月24日

Triple-cooperative Video Shadow Detection

Arxiv

6+阅读 · 2021年3月11日

Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

Arxiv

4+阅读 · 2020年12月31日

Object Detection in Videos by High Quality Object Linking

Arxiv

4+阅读 · 2019年4月8日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Arxiv

6+阅读 · 2018年3月27日

Object Detection in Videos by Short and Long Range Object Linking

Arxiv

6+阅读 · 2018年1月30日

VIP会员

文章信息

相关主题

相关VIP内容

最新《Transformers模型》教程，64页ppt

最新《Transformers模型》教程，64页ppt

专知会员服务

321+阅读 · 2020年11月26日

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

【KDD2020】动态图的拉普拉斯变换点检测，Laplacian Change Point Detection for Dynamic Graphs

专知会员服务

38+阅读 · 2020年7月3日

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

【AAAI2020】Context-Transformer:上下文转换器:解决对象混淆的小样本检测，Context-Transformer: Tackling Object Confusion for Few-Shot Detection

专知会员服务

51+阅读 · 2020年3月17日

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

【厦门大学-CVPR2020】协调可迁移性与可判别性的自适应目标检测器，Adapting Object Detectors

专知会员服务

26+阅读 · 2020年3月16日

深度强化学习策略梯度教程，53页ppt

深度强化学习策略梯度教程，53页ppt

专知会员服务

184+阅读 · 2020年2月1日

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

【Google AI新论文EfficientDet】规模化高效化的物体检测，EfficientDet: Scalable and Efficient Object Detection(附pdf)

专知会员服务

27+阅读 · 2019年11月24日

【ICCV2019教程】物体检测的R-CNN通用框架，The Generalized R-CNN Framework for Object Detection，180页ppt，Facebook 人工智能研究院Ross Girshick大神

【ICCV2019教程】物体检测的R-CNN通用框架，The Generalized R-CNN Framework for Object Detection，180页ppt，Facebook 人工智能研究院Ross Girshick大神

专知会员服务

25+阅读 · 2019年11月16日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

已删除

将门创投

5+阅读 · 2019年4月4日

论文笔记之Feature Selective Networks for Object Detection

论文笔记之Feature Selective Networks for Object Detection

统计学习与视觉计算组

21+阅读 · 2018年7月26日

相关论文

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection

Arxiv

0+阅读 · 2021年12月7日

DCAN: Improving Temporal Action Detection via Dual Context Aggregation

Arxiv

0+阅读 · 2021年12月7日

TubeR: Tubelet Transformer for Video Action Detection

Arxiv

0+阅读 · 2021年12月6日

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Temporal Context Aggregation Network for Temporal Action Proposal Refinement

Arxiv

5+阅读 · 2021年3月24日

Triple-cooperative Video Shadow Detection

Arxiv

6+阅读 · 2021年3月11日

Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection

Arxiv

4+阅读 · 2020年12月31日

Object Detection in Videos by High Quality Object Linking

Arxiv

4+阅读 · 2019年4月8日

Progressive Sparse Local Attention for Video object detection

Arxiv

4+阅读 · 2019年3月21日

Towards Human-Machine Cooperation: Self-supervised Sample Mining for Object Detection

Arxiv

6+阅读 · 2018年3月27日

Object Detection in Videos by Short and Long Range Object Linking

Arxiv

6+阅读 · 2018年1月30日

微信扫码咨询专知VIP会员