运动引导关注融合以识别视频互动 (Motion Guided Attention Fusion to Recognize Interactions from Videos) - 专知论文

会员服务 ·

0

INTERACT · state-of-the-art · 注意力机制 · Performer · 目标检测 ·

2021 年 4 月 1 日

Motion Guided Attention Fusion to Recognize Interactions from Videos

翻译：运动引导关注融合以识别视频互动

Tae Soo Kim,Jonathan Jones,Gregory D. Hager

We present a dual-pathway approach for recognizing fine-grained interactions from videos. We build on the success of prior dual-stream approaches, but make a distinction between the static and dynamic representations of objects and their interactions explicit by introducing separate motion and object detection pathways. Then, using our new Motion-Guided Attention Fusion module, we fuse the bottom-up features in the motion pathway with features captured from object detections to learn the temporal aspects of an action. We show that our approach can generalize across appearance effectively and recognize actions where an actor interacts with previously unseen objects. We validate our approach using the compositional action recognition task from the Something-Something-v2 dataset where we outperform existing state-of-the-art methods. We also show that our method can generalize well to real world tasks by showing state-of-the-art performance on recognizing humans assembling various IKEA furniture on the IKEA-ASM dataset.

翻译：我们展示了一种双路径方法,以识别视频中细微的相互作用。我们以先前的双流方法的成功为基础,但通过引入单独的运动和物体探测路径,区分物体静态和动态的表达方式及其显露的相互作用。然后,我们利用我们新的运动-引导引力融合模块,将运动路径中自下而上的特点与从物体探测中采集的特征结合起来,以了解一项行动的时间方面。我们表明,我们的方法可以有效地将外观加以概括,并承认一个行为者与以前看不见的物体发生相互作用时的行动。我们用某些东西- 点数- V2 数据集的组合动作识别任务来验证我们的方法,在这些数据集中,我们优于现有的最新技术方法。我们还表明,我们的方法可以通过在识别在IKEA- ASM数据集上组装各种IKEA家具的人时展示最先进的表现,从而将现实世界任务概括化。

1

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

[CVPR 2021] 序列到序列对比学习的文本识别

[CVPR 2021] 序列到序列对比学习的文本识别

专知会员服务

29+阅读 · 2021年4月14日

近期必读的5篇顶会CVPR 2021【行为识别】相关论文和代码

专知会员服务

60+阅读 · 2021年3月17日

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

专知会员服务

66+阅读 · 2020年8月22日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

暗通沟渠：Multi-lingual Attention

暗通沟渠：Multi-lingual Attention

我爱读PAMI

7+阅读 · 2018年2月24日

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition

Arxiv

8+阅读 · 2020年12月4日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning

Arxiv

3+阅读 · 2019年6月11日

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Arxiv

9+阅读 · 2019年3月29日

Relation-aware Graph Attention Network for Visual Question Answering

Arxiv

4+阅读 · 2019年3月29日

Global-and-local attention networks for visual recognition

Global-and-local attention networks for visual recognition

Arxiv

5+阅读 · 2018年9月6日

Video-to-Video Synthesis

Video-to-Video Synthesis

Arxiv

9+阅读 · 2018年8月20日

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

Arxiv

3+阅读 · 2018年5月1日

Question Type Guided Attention in Visual Question Answering

Arxiv

5+阅读 · 2018年4月6日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

VIP会员

文章信息

相关主题

state-of-the-art

注意力机制

相关VIP内容

[CVPR 2021] 序列到序列对比学习的文本识别

[CVPR 2021] 序列到序列对比学习的文本识别

专知会员服务

29+阅读 · 2021年4月14日

近期必读的5篇顶会CVPR 2021【行为识别】相关论文和代码

专知会员服务

60+阅读 · 2021年3月17日

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

最新《模仿学习 - Imitation Learning》教程，63页ppt，微软Kamil Ciosek

专知会员服务

66+阅读 · 2020年8月22日

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

【CVPR2020-微软-CMU】视频物体分割的一种直推方法，Video Object Segmentation

专知会员服务

7+阅读 · 2020年4月16日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

【中科院自动化所】序列到序列语音识别的无监督预训练（Unsupervised pre-training for sequence to sequence speech recognition）

专知会员服务

33+阅读 · 2020年1月5日

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

【论文】使用编码器进行命名实体识别（TENER: Adapting Transformer Encoder for Named Entity Recognition）

专知会员服务

52+阅读 · 2019年12月28日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

《步兵小单元山地严寒作战指南》美军最新条令200页

《联合作战概念的发展》最新报告

俄制无人机弹药

《复杂场景下自主着陆的模型预测控制技术》92页

相关资讯

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

123+阅读 · 2019年8月15日

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

无人机视觉挑战赛 | ICCV 2019 Workshop—VisDrone2019

PaperWeekly

7+阅读 · 2019年5月5日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

【跟踪Tracking】15篇论文+代码 | 中秋快乐~

专知

18+阅读 · 2018年9月24日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

【论文推荐】最新五篇视频分类相关论文—细粒度行人识别、群组归一化、MLtuner、时序特征

专知

22+阅读 · 2018年4月21日

暗通沟渠：Multi-lingual Attention

暗通沟渠：Multi-lingual Attention

我爱读PAMI

7+阅读 · 2018年2月24日

相关论文

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition

Arxiv

8+阅读 · 2020年12月4日

Scene-based Factored Attention for Image Captioning

Arxiv

4+阅读 · 2019年8月7日

Object-aware Aggregation with Bidirectional Temporal Graph for Video Captioning

Arxiv

3+阅读 · 2019年6月11日

An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition

Arxiv

9+阅读 · 2019年3月29日

Relation-aware Graph Attention Network for Visual Question Answering

Arxiv

4+阅读 · 2019年3月29日

Global-and-local attention networks for visual recognition

Global-and-local attention networks for visual recognition

Arxiv

5+阅读 · 2018年9月6日

Video-to-Video Synthesis

Video-to-Video Synthesis

Arxiv

9+阅读 · 2018年8月20日

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

Arxiv

3+阅读 · 2018年5月1日

Question Type Guided Attention in Visual Question Answering

Arxiv

5+阅读 · 2018年4月6日

Reconstruction Network for Video Captioning

Arxiv

5+阅读 · 2018年3月30日

微信扫码咨询专知VIP会员