利用空间-临时建模和多模式融合促进人类行动承认 (Exploiting Spatial-Temporal Modelling and Multi-Modal Fusion for Human Action Recognition) - 专知论文

会员服务 ·

0

StNet · MoDELS · INFORMS · Xception · Integration ·

2018 年 6 月 27 日

Exploiting Spatial-Temporal Modelling and Multi-Modal Fusion for Human Action Recognition

翻译：利用空间-临时建模和多模式融合促进人类行动承认

Dongliang He,Fu Li,Qijie Zhao,Xiang Long,Yi Fu,Shilei Wen

In this report, our approach to tackling the task of ActivityNet 2018 Kinetics-600 challenge is described in detail. Though spatial-temporal modelling methods, which adopt either such end-to-end framework as I3D \cite{i3d} or two-stage frameworks (i.e., CNN+RNN), have been proposed in existing state-of-the-arts for this task, video modelling is far from being well solved. In this challenge, we propose spatial-temporal network (StNet) for better joint spatial-temporal modelling and comprehensively video understanding. Besides, given that multi-modal information is contained in video source, we manage to integrate both early-fusion and later-fusion strategy of multi-modal information via our proposed improved temporal Xception network (iTXN) for video understanding. Our StNet RGB single model achieves 78.99\% top-1 precision in the Kinetics-600 validation set and that of our improved temporal Xception network which integrates RGB, flow and audio modalities is up to 82.35\%. After model ensemble, we achieve top-1 precision as high as 85.0\% on the validation set and rank No.1 among all submissions.

翻译：本报告详细介绍了我们应对2018年活动网动因-600号挑战的方法。虽然在目前关于这项任务的最新条款中已经提出了处理2018年动因-600号活动网任务的方法。尽管空间-时际建模方法(即I3D\cite{i3d}或两阶段框架(即CNN+RNNNN),但目前为这项任务提出的现有阶段框架(即CNN+RNN)已经采纳了I3D\cite{i3d}或两个阶段的端对端框架(即CNN+RNNN),但视频建模远未完全解决。在这项挑战中,我们提出了空间-时际网络(StNet),以更好地联合空间-时空建模和全面视频理解。此外,鉴于多模式信息包含在视频源中,我们设法通过拟议改进的时际Xceptionion网络(iTXXN)将多式信息的早期集成和后集成战略集成,作为高校准文件,我们作为高校定的一级和高校准。

0

相关内容

StNet

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

专知会员服务

51+阅读 · 2020年2月24日

【斯坦福大学】具有共同注意力的对抗性跨域动作识别（Adversarial Cross-Domain Action Recognition with Co-Attention）

【斯坦福大学】具有共同注意力的对抗性跨域动作识别（Adversarial Cross-Domain Action Recognition with Co-Attention）

专知会员服务

38+阅读 · 2019年12月26日

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

专知会员服务

100+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

专知会员服务

29+阅读 · 2019年10月13日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

人工智能 | 国际会议信息6条

人工智能 | 国际会议信息6条

Call4Papers

5+阅读 · 2019年1月4日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

计算机类 | 国际会议信息7条

计算机类 | 国际会议信息7条

Call4Papers

3+阅读 · 2017年11月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Speech2Action: Cross-modal Supervision for Action Recognition

Speech2Action: Cross-modal Supervision for Action Recognition

Arxiv

7+阅读 · 2020年3月30日

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Arxiv

7+阅读 · 2020年3月19日

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Arxiv

3+阅读 · 2019年5月10日

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

Arxiv

6+阅读 · 2018年4月7日

Question Type Guided Attention in Visual Question Answering

Arxiv

5+阅读 · 2018年4月6日

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Arxiv

6+阅读 · 2018年3月21日

Learning Representative Temporal Features for Action Recognition

Arxiv

4+阅读 · 2018年3月14日

Human Action Adverb Recognition: ADHA Dataset and A Three-Stream Hybrid Model

Arxiv

4+阅读 · 2018年2月12日

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

Arxiv

8+阅读 · 2017年11月22日

VIP会员

文章信息

相关主题

相关VIP内容

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

抢鲜看！13篇CVPR2020论文链接/开源代码/解读

专知会员服务

50+阅读 · 2020年2月26日

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

【CVPR2020】用于细粒度动作识别的多模式域自适应，Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

专知会员服务

78+阅读 · 2020年2月25日

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

【快讯】CVPR2020结果出炉，1470篇上榜，你的paper中了吗？

专知会员服务

51+阅读 · 2020年2月24日

【斯坦福大学】具有共同注意力的对抗性跨域动作识别（Adversarial Cross-Domain Action Recognition with Co-Attention）

【斯坦福大学】具有共同注意力的对抗性跨域动作识别（Adversarial Cross-Domain Action Recognition with Co-Attention）

专知会员服务

38+阅读 · 2019年12月26日

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

专知会员服务

100+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

33+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

视频摘要最新综述文章，Video Skimming: Taxonomy and Comprehensive Survey

专知会员服务

29+阅读 · 2019年10月13日

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

【人工智能在2019：一年回顾】反人工智能，AI in 2019: A Year in Review

专知会员服务

79+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《美陆军徒步机动作战条令手册》最新168页

【博士论文】基于不确定性的可靠性：现代机器学习中的选择性预测与可信部署

军事后勤数字化未来展望

《美海军后勤体系整合与创新挑战》最新报告

相关资讯

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

计算机 | 入门级EI会议ICVRIS 2019诚邀稿件

Call4Papers

10+阅读 · 2019年6月24日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

Call for Participation: Shared Tasks in NLPCC 2019

Call for Participation: Shared Tasks in NLPCC 2019

中国计算机学会

5+阅读 · 2019年3月22日

IEEE | DSC 2019诚邀稿件 (EI检索)

IEEE | DSC 2019诚邀稿件 (EI检索)

Call4Papers

10+阅读 · 2019年2月25日

人工智能 | 国际会议信息6条

人工智能 | 国际会议信息6条

Call4Papers

5+阅读 · 2019年1月4日

计算机视觉领域顶会CVPR 2018 接受论文列表

计算机视觉领域顶会CVPR 2018 接受论文列表

专知

7+阅读 · 2018年5月26日

计算机类 | 国际会议信息7条

计算机类 | 国际会议信息7条

Call4Papers

3+阅读 · 2017年11月17日

Capsule Networks解析

Capsule Networks解析

机器学习研究会

11+阅读 · 2017年11月12日

Auto-Encoding GAN

Auto-Encoding GAN

CreateAMind

7+阅读 · 2017年8月4日

相关论文

Text Detection and Recognition in the Wild: A Review

Arxiv

20+阅读 · 2020年6月8日

Speech2Action: Cross-modal Supervision for Action Recognition

Speech2Action: Cross-modal Supervision for Action Recognition

Arxiv

7+阅读 · 2020年3月30日

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

Arxiv

7+阅读 · 2020年3月19日

GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering

Arxiv

3+阅读 · 2019年5月10日

Regularizing RNNs for Caption Generation by Reconstructing The Past with The Present

Arxiv

6+阅读 · 2018年4月7日

Question Type Guided Attention in Visual Question Answering

Arxiv

5+阅读 · 2018年4月6日

Zero-shot Recognition via Semantic Embeddings and Knowledge Graphs

Arxiv

6+阅读 · 2018年3月21日

Learning Representative Temporal Features for Action Recognition

Arxiv

4+阅读 · 2018年3月14日

Human Action Adverb Recognition: ADHA Dataset and A Three-Stream Hybrid Model

Arxiv

4+阅读 · 2018年2月12日

Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification

Arxiv

8+阅读 · 2017年11月22日

微信扫码咨询专知VIP会员