使用变换器检测端至端时间行动 (End-to-end Temporal Action Detection with Transformer) - 专知论文

会员服务 ·

0

Performer · 端到端 · INFORMS · 变换 · 示例 ·

2021 年 6 月 18 日

End-to-end Temporal Action Detection with Transformer

翻译：使用变换器检测端至端时间行动

Xiaolong Liu,Qimeng Wang,Yao Hu,Xu Tang,Song Bai,Xiang Bai

Temporal action detection (TAD) aims to determine the semantic label and the boundaries of every action instance in an untrimmed video. It is a fundamental task in video understanding and significant progress has been made in TAD. Previous methods involve multiple stages or networks and hand-designed rules or operations, which fall short in efficiency and flexibility. Here, we construct an end-to-end framework for TAD upon Transformer, termed \textit{TadTR}, which simultaneously predicts all action instances as a set of labels and temporal locations in parallel. TadTR is able to adaptively extract temporal context information needed for making action predictions, by selectively attending to a number of snippets in a video. It greatly simplifies the pipeline of TAD and runs much faster than previous detectors. Our method achieves state-of-the-art performance on HACS Segments and THUMOS14 and competitive performance on ActivityNet-1.3. Our code will be made available at \url{https://github.com/xlliu7/TadTR}.

翻译：时间动作检测(TAD)旨在确定语义标签和每个动作的界限,这是视频理解的一项基本任务,在TAD中已经取得了显著进展。以往的方法涉及多个阶段或网络和手工设计的规则或操作,但效率与灵活性都不足。在这里, 我们在变换器上为TAD建立一个端到端的框架, 称为\ textit{TadTR}, 它同时将所有动作都预测成一组标签和时间位置并列。 TadTR 能够通过有选择地在视频中选取一些片段进行行动预测所需的时间背景信息。它大大简化了TAD的管道, 运行速度比以前的探测器快得多。我们的方法在HACS段和THUMOS14 上实现了最先进的表现, 在活动网- 1. 3. 3上的竞争性表现。我们的代码将在\url{https://github.com/xllu7TadTR}上公布。

3

相关内容

Performer

《图Transformer网络与语音识别》Facebook语音大牛Awni Hannun，附121页Slides与视频

专知会员服务

33+阅读 · 2021年6月26日

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

专知会员服务

22+阅读 · 2021年4月20日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

专知会员服务

51+阅读 · 2020年5月28日

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

专知会员服务

39+阅读 · 2020年3月19日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

专知会员服务

46+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

已删除

将门创投

10+阅读 · 2019年3月6日

Temporal Action Detection (时序动作检测)方向2017年会议论文整理

Temporal Action Detection (时序动作检测)方向2017年会议论文整理

极市平台

3+阅读 · 2017年11月30日

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection

Arxiv

0+阅读 · 2021年8月19日

Video Transformer Network

Arxiv

0+阅读 · 2021年8月17日

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

Arxiv

6+阅读 · 2021年4月26日

End-to-End Object Detection with Fully Convolutional Network

Arxiv

9+阅读 · 2021年3月26日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

End-to-end Lane Shape Prediction with Transformers

Arxiv

3+阅读 · 2020年11月28日

Clustered Object Detection in Aerial Images

Clustered Object Detection in Aerial Images

Arxiv

5+阅读 · 2019年8月27日

Object Detection in Videos by High Quality Object Linking

Arxiv

4+阅读 · 2019年4月8日

AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection

Arxiv

3+阅读 · 2018年3月4日

VIP会员

文章信息

相关主题

相关VIP内容

《图Transformer网络与语音识别》Facebook语音大牛Awni Hannun，附121页Slides与视频

专知会员服务

33+阅读 · 2021年6月26日

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

【CVPR 2021】变换器跟踪TransT: Transformer Tracking

专知会员服务

22+阅读 · 2021年4月20日

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

【商汤科技】可变形Transformers端到端对象检测，Deformable DETR

专知会员服务

33+阅读 · 2020年10月11日

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

新杀器来了！Facebook AI提出DETR：用Transformers来进行端到端的目标检测

专知会员服务

51+阅读 · 2020年5月28日

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

【CVPR2020】通过潦草注释的弱监督显著目标检测，Weakly-Supervised Salient Object Detection via Scribble Annotations

专知会员服务

39+阅读 · 2020年3月19日

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

【CVPR2020】从未标记的视频中学习视频对象分割，Learning Video Object Segmentation from Unlabeled Videos

专知会员服务

36+阅读 · 2020年3月12日

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

【目标检测 | 2019最新综述】目标检测中的不平衡问题，附31页PDF， Imbalance Problems in Object Detection: A Review

专知会员服务

46+阅读 · 2019年11月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

热门VIP内容

开通专知VIP会员享更多权益服务

《无人机集群配置对模拟作战环境任务效能的影响研究》最新50页

《俄罗斯作战模式解析：对俄特别军事行动的观察报告》最新325页

军用无人机集群技术尚未成熟——但潜力可期

《无人机改变战争规则，但无法破解陆战固有挑战》最新报告

相关资讯

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

已删除

将门创投

10+阅读 · 2019年3月6日

Temporal Action Detection (时序动作检测)方向2017年会议论文整理

Temporal Action Detection (时序动作检测)方向2017年会议论文整理

极市平台

3+阅读 · 2017年11月30日

相关论文

VIL-100: A New Dataset and A Baseline Model for Video Instance Lane Detection

Arxiv

0+阅读 · 2021年8月19日

Video Transformer Network

Arxiv

0+阅读 · 2021年8月17日

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

Arxiv

6+阅读 · 2021年4月26日

End-to-End Object Detection with Fully Convolutional Network

Arxiv

9+阅读 · 2021年3月26日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

Temporal Relational Modeling with Self-Supervision for Action Segmentation

Arxiv

13+阅读 · 2020年12月14日

End-to-end Lane Shape Prediction with Transformers

Arxiv

3+阅读 · 2020年11月28日

Clustered Object Detection in Aerial Images

Clustered Object Detection in Aerial Images

Arxiv

5+阅读 · 2019年8月27日

Object Detection in Videos by High Quality Object Linking

Arxiv

4+阅读 · 2019年4月8日

AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection

Arxiv

3+阅读 · 2018年3月4日

微信扫码咨询专知VIP会员