以RGB-D为基础的承认RGB-D运动的脱钩和调和时空代表制 (Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition) - 专知论文

会员服务 ·

0

INFORMS · Performer · INTERACT · Better · state-of-the-art ·

2021 年 12 月 16 日

Decoupling and Recoupling Spatiotemporal Representation for RGB-D-based Motion Recognition

翻译：以RGB-D为基础的承认RGB-D运动的脱钩和调和时空代表制

Benjia Zhou,Pichao Wang,Jun Wan,Yanyan Liang,Fan Wang,Du Zhang,Zhen Lei,Hao Li,Rong Jin

from arxiv, open sourced; codes and models are available:https://github.com/damo-cv/MotionRGBD; transformer-based method

Decoupling spatiotemporal representation refers to decomposing the spatial and temporal features into dimension-independent factors. Although previous RGB-D-based motion recognition methods have achieved promising performance through the tightly coupled multi-modal spatiotemporal representation, they still suffer from (i) optimization difficulty under small data setting due to the tightly spatiotemporal-entangled modeling;(ii) information redundancy as it usually contains lots of marginal information that is weakly relevant to classification; and (iii) low interaction between multi-modal spatiotemporal information caused by insufficient late fusion. To alleviate these drawbacks, we propose to decouple and recouple spatiotemporal representation for RGB-D-based motion recognition. Specifically, we disentangle the task of learning spatiotemporal representation into 3 sub-tasks: (1) Learning high-quality and dimension independent features through a decoupled spatial and temporal modeling network. (2) Recoupling the decoupled representation to establish stronger space-time dependency. (3) Introducing a Cross-modal Adaptive Posterior Fusion (CAPF) mechanism to capture cross-modal spatiotemporal information from RGB-D data. Seamless combination of these novel designs forms a robust spatialtemporal representation and achieves better performance than state-of-the-art methods on four public motion datasets. Our code is available at https://github.com/damo-cv/MotionRGBD.

翻译：虽然先前的RGB-D基于运动识别方法通过紧密结合的多模式时空代表方法取得了有希望的成绩,但是它们仍然面临着以下困难:(一) 小型数据设置方面的最大困难,其原因是随机交织的建模过于紧凑;(二) 信息冗余,因为它通常包含许多与分类有关的边际信息;(三) 多模式的时空信息互动程度较低,而这种信息与时间不甚相关;(三) 多模式的时空信息互动程度较低,因为延迟融合不足,造成多模式的时空信息互动程度较低。为减轻这些退缩,我们提议在基于RGB-D的动作识别中进行双对调和对调双调的时空代表。具体地说,我们通过一个分解的空间和时间建模网络学习高品质和维度独立特征;(二) 将多模式的多模式代表结构重新组合起来,以建立更强的空间时代依赖性依赖性关系。 (3) 在跨模式的Podal-Dial-comsial Supal Supal Supal Stative developal developationsmmationsmmmmation-smission-suplation-

7

相关内容

INFORMS

《计算机信息》杂志发表高质量的论文，扩大了运筹学和计算的范围，寻求有关理论、方法、实验、系统和应用方面的原创研究论文、新颖的调查和教程论文，以及描述新的和有用的软件工具的论文。官网链接：https://pubsonline.informs.org/journal/ijoc

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

专知会员服务

100+阅读 · 2019年11月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

1300篇！CVPR2019接收结果公布，你中了吗？（附部分论文链接）

1300篇！CVPR2019接收结果公布，你中了吗？（附部分论文链接）

极市平台

11+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】SLAM相关资源大列表

【推荐】SLAM相关资源大列表

机器学习研究会

10+阅读 · 2017年8月18日

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

Arxiv

0+阅读 · 2022年2月18日

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

Arxiv

5+阅读 · 2022年1月24日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

NeRV: Neural Representations for Videos

Arxiv

9+阅读 · 2021年10月26日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

Self-supervised Video Representation Learning by Context and Motion Decoupling

Arxiv

6+阅读 · 2021年4月2日

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Arxiv

3+阅读 · 2020年12月10日

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

Arxiv

4+阅读 · 2020年12月4日

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

Arxiv

3+阅读 · 2018年5月1日

Learning Representative Temporal Features for Action Recognition

Arxiv

4+阅读 · 2018年3月14日

VIP会员

文章信息

相关主题

state-of-the-art

相关VIP内容

计算机科学课程与视频课件合集，Computer Science courses with video lectures

计算机科学课程与视频课件合集，Computer Science courses with video lectures

专知会员服务

37+阅读 · 2022年1月24日

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

纽约大学最新《语音识别Speech Recognition》2020课程，不可错过！

专知会员服务

44+阅读 · 2020年11月2日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

CVPR 2020 论文开源项目合集

专知会员服务

110+阅读 · 2020年3月12日

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

【行为识别| 2019最新综述】时空动作识别综述（Spatio-temporal Action Recognition: A Survey），附15页PDF

专知会员服务

100+阅读 · 2019年11月23日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

【深度学习视频分析/多模态学习资源大列表】

【深度学习视频分析/多模态学习资源大列表】

专知会员服务

92+阅读 · 2019年10月16日

开源书：PyTorch深度学习起步

开源书：PyTorch深度学习起步

专知会员服务

51+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【新书】面向企业的图学习扩展：生产级图学习与推理，485页pdf

AI智能体编程：技术、挑战与机遇综述

【国家标准】数据安全技术数据安全风险评估方法

【CMU博士论文】交互式学习的进展：替代性反馈机制与自适应因果推理

相关资讯

ICCV 2019 行为识别/视频理解论文汇总

ICCV 2019 行为识别/视频理解论文汇总

极市平台

15+阅读 · 2019年9月26日

Successor representations 强化学习表示的生物学启发

Successor representations 强化学习表示的生物学启发

CreateAMind

6+阅读 · 2019年9月5日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

1300篇！CVPR2019接收结果公布，你中了吗？（附部分论文链接）

1300篇！CVPR2019接收结果公布，你中了吗？（附部分论文链接）

极市平台

11+阅读 · 2019年2月26日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

Hierarchical Disentangled Representations

Hierarchical Disentangled Representations

CreateAMind

4+阅读 · 2018年4月15日

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

【ICCV 2017论文集】计算机视觉顶级会议ICCV2017 Open Access Repository

专知

6+阅读 · 2017年10月14日

【推荐】深度学习目标检测全面综述

【推荐】深度学习目标检测全面综述

机器学习研究会

21+阅读 · 2017年9月13日

【推荐】SLAM相关资源大列表

【推荐】SLAM相关资源大列表

机器学习研究会

10+阅读 · 2017年8月18日

相关论文

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

Is Cross-Attention Preferable to Self-Attention for Multi-Modal Emotion Recognition?

Arxiv

0+阅读 · 2022年2月18日

UniFormer: Unifying Convolution and Self-attention for Visual Recognition

Arxiv

5+阅读 · 2022年1月24日

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation

Arxiv

11+阅读 · 2021年12月16日

NeRV: Neural Representations for Videos

Arxiv

9+阅读 · 2021年10月26日

MST: Masked Self-Supervised Transformer for Visual Representation

Arxiv

4+阅读 · 2021年6月10日

Self-supervised Video Representation Learning by Context and Motion Decoupling

Arxiv

6+阅读 · 2021年4月2日

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Spatiotemporal Graph Neural Network based Mask Reconstruction for Video Object Segmentation

Arxiv

3+阅读 · 2020年12月10日

Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion

Arxiv

4+阅读 · 2020年12月4日

Learning Human Pose Models from Synthesized Data for Robust RGB-D Action Recognition

Arxiv

3+阅读 · 2018年5月1日

Learning Representative Temporal Features for Action Recognition

Arxiv

4+阅读 · 2018年3月14日

微信扫码咨询专知VIP会员