多模态缺失鲁棒动作识别的良好实践 (Towards Good Practices for Missing Modality Robust Action Recognition) - 专知论文

会员服务 ·

0

模态 · 动作识别 · 多模 · 鲁棒 · 多模态 ·

2023 年 3 月 30 日

Towards Good Practices for Missing Modality Robust Action Recognition

翻译：多模态缺失鲁棒动作识别的良好实践

Sangmin Woo,Sumin Lee,Yeonju Park,Muhammad Adi Nugroho,Changick Kim

from arxiv, AAAI 2023 (Oral); Code: https://github.com/sangminwoo/ActionMAE

Standard multi-modal models assume the use of the same modalities in training and inference stages. However, in practice, the environment in which multi-modal models operate may not satisfy such assumption. As such, their performances degrade drastically if any modality is missing in the inference stage. We ask: how can we train a model that is robust to missing modalities? This paper seeks a set of good practices for multi-modal action recognition, with a particular interest in circumstances where some modalities are not available at an inference time. First, we study how to effectively regularize the model during training (e.g., data augmentation). Second, we investigate on fusion methods for robustness to missing modalities: we find that transformer-based fusion shows better robustness for missing modality than summation or concatenation. Third, we propose a simple modular network, ActionMAE, which learns missing modality predictive coding by randomly dropping modality features and tries to reconstruct them with the remaining modality features. Coupling these good practices, we build a model that is not only effective in multi-modal action recognition but also robust to modality missing. Our model achieves the state-of-the-arts on multiple benchmarks and maintains competitive performances even in missing modality scenarios. Codes are available at https://github.com/sangminwoo/ActionMAE.

翻译：标准的多模态模型假定在训练和推理阶段使用相同的模态。然而，在实践中，多模态模型运行的环境可能无法满足此类假设。因此，如果任何一种模态在推理阶段丢失，它们的性能会急剧下降。我们的研究问题是：如何训练一个对缺失模态表现稳健的模型？本文提出一组多模态动作识别的良好实践，尤其关注缺失模态的情况。首先，我们研究如何在训练期间有效地规范模型(例如，数据增强)。其次，我们探讨了融合方法来解决缺失模态的鲁棒性问题：我们发现基于Transformer的融合比求和或串联表现更好。第三，我们提出了一种简单的模块化网络ActionMAE，通过随机删除模态特征并尝试用剩余模态特征重建它们来学习缺失模态预测编码。结合这些良好实践，我们构建了一个不仅在多模态动作识别方面有效而且对模态缺失具有鲁棒性的模型。我们的模型在多个基准测试中都达到了最新水平，并且即使在缺失模态的情况下也能保持竞争性能。源代码可在https://github.com/sangminwoo/ActionMAE 上获取。

0

相关内容

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ICLR2022】分布外泛化的不确定性建模

【ICLR2022】分布外泛化的不确定性建模

专知会员服务

43+阅读 · 2022年2月11日

【SIGIR2020】一个统一的双视图模型，用于具有不一致性损失的评论总结和情绪分类，A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss

【SIGIR2020】一个统一的双视图模型，用于具有不一致性损失的评论总结和情绪分类，A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss

专知会员服务

22+阅读 · 2020年6月3日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【斯坦福大学】具有共同注意力的对抗性跨域动作识别（Adversarial Cross-Domain Action Recognition with Co-Attention）

【斯坦福大学】具有共同注意力的对抗性跨域动作识别（Adversarial Cross-Domain Action Recognition with Co-Attention）

专知会员服务

38+阅读 · 2019年12月26日

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

专知会员服务

90+阅读 · 2019年12月15日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

多模态认知计算

多模态认知计算

专知

7+阅读 · 2022年9月16日

Pytorch多模态框架MMF

Pytorch多模态框架MMF

专知

49+阅读 · 2020年6月20日

【ICML2019】IanGoodfellow自注意力GAN的代码与PPT

【ICML2019】IanGoodfellow自注意力GAN的代码与PPT

GAN生成式对抗网络

18+阅读 · 2019年6月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

行人重识别目标中心编码外观模型的研究

国家自然科学基金

1+阅读 · 2015年12月31日

方差正则化的分类模型选择方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于信息熵和DCS的多基线SAR干涉理论与新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于多任务学习的应用商店客户识别模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

小样本空间制图

国家自然科学基金

0+阅读 · 2012年12月31日

S1P联合PR-MSCs移植在治疗小鼠急性心肌梗死中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

数据驱动的海洋平台故障预测方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

负载扰动下燃料电池动态特性的分数阶建模与控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

数据缺失时高维数据降维分析的方法、理论与应用

国家自然科学基金

1+阅读 · 2011年12月31日

Towards the Practical Utility of Federated Learning in the Medical Domain

Arxiv

0+阅读 · 2023年5月19日

Hierarchical Compositional Representations for Few-shot Action Recognition

Arxiv

0+阅读 · 2023年5月19日

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Arxiv

1+阅读 · 2023年5月18日

Generalized Neural Closure Models with Interpretability

Arxiv

0+阅读 · 2023年5月18日

TEPrompt: Task Enlightenment Prompt Learning for Implicit Discourse Relation Recognition

Arxiv

0+阅读 · 2023年5月18日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Arxiv

20+阅读 · 2021年8月30日

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Arxiv

25+阅读 · 2019年10月30日

Re-ID done right: towards good practices for person re-identification

Arxiv

14+阅读 · 2018年1月16日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【ICLR2022】分布外泛化的不确定性建模

【ICLR2022】分布外泛化的不确定性建模

专知会员服务

43+阅读 · 2022年2月11日

【SIGIR2020】一个统一的双视图模型，用于具有不一致性损失的评论总结和情绪分类，A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss

【SIGIR2020】一个统一的双视图模型，用于具有不一致性损失的评论总结和情绪分类，A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss

专知会员服务

22+阅读 · 2020年6月3日

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

【Yoshua Bengio新论文】多任务自监督学习语音识别，MULTI-TASK SELF-SUPERVISED LEARNING FOR ROBUST SPEECH RECOGNITION

专知会员服务

39+阅读 · 2020年1月30日

【斯坦福大学】具有共同注意力的对抗性跨域动作识别（Adversarial Cross-Domain Action Recognition with Co-Attention）

【斯坦福大学】具有共同注意力的对抗性跨域动作识别（Adversarial Cross-Domain Action Recognition with Co-Attention）

专知会员服务

38+阅读 · 2019年12月26日

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

【新开放书】医学影像原理与应用，Medical Imaging Principles and Applications

专知会员服务

90+阅读 · 2019年12月15日

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

【视频中的零样本动作识别：综述】Zero-Shot Action Recognition in Videos: A Survey

专知会员服务

39+阅读 · 2019年10月12日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

【CMU卡内基梅隆大学】深度学习在计算机视觉的应用：方法，解释，因果与公平性

专知会员服务

83+阅读 · 2019年10月9日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

【CMU博士论文】数据驱动决策中的激励、信息与不确定性

DGP双粒度提示框架：图增强大模型助力欺诈检测

【ICCV2025】ESSENTIAL：用于视频类增量学习的情景记忆与语义记忆整合

唯快不破：大型语言模型高效架构综述

相关资讯

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

直播 | Interpretable and Trustworthy Graph Geometric Deep Learning

图与推荐

2+阅读 · 2022年11月2日

多模态认知计算

多模态认知计算

专知

7+阅读 · 2022年9月16日

Pytorch多模态框架MMF

Pytorch多模态框架MMF

专知

49+阅读 · 2020年6月20日

【ICML2019】IanGoodfellow自注意力GAN的代码与PPT

【ICML2019】IanGoodfellow自注意力GAN的代码与PPT

GAN生成式对抗网络

18+阅读 · 2019年6月30日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

逆强化学习-学习人先验的动机

逆强化学习-学习人先验的动机

CreateAMind

16+阅读 · 2019年1月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

笔记 | Deep active learning for named entity recognition

笔记 | Deep active learning for named entity recognition

黑龙江大学自然语言处理实验室

24+阅读 · 2018年5月27日

可解释的CNN

可解释的CNN

CreateAMind

17+阅读 · 2017年10月5日

相关论文

Towards the Practical Utility of Federated Learning in the Medical Domain

Arxiv

0+阅读 · 2023年5月19日

Hierarchical Compositional Representations for Few-shot Action Recognition

Arxiv

0+阅读 · 2023年5月19日

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Arxiv

1+阅读 · 2023年5月18日

Generalized Neural Closure Models with Interpretability

Arxiv

0+阅读 · 2023年5月18日

TEPrompt: Task Enlightenment Prompt Learning for Implicit Discourse Relation Recognition

Arxiv

0+阅读 · 2023年5月18日

Multimodal Prompting with Missing Modalities for Visual Recognition

Arxiv

11+阅读 · 2023年3月6日

Cross-Modal Object Tracking: Modality-Aware Representations and A Unified Benchmark

Arxiv

14+阅读 · 2021年11月11日

Efficient Visual Recognition with Deep Neural Networks: A Survey on Recent Advances and New Directions

Arxiv

20+阅读 · 2021年8月30日

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation

Arxiv

25+阅读 · 2019年10月30日

Re-ID done right: towards good practices for person re-identification

Arxiv

14+阅读 · 2018年1月16日

相关基金

MARVELD1基因调控肝细胞癌介入治疗的机制研究

国家自然科学基金

0+阅读 · 2016年12月31日

行人重识别目标中心编码外观模型的研究

国家自然科学基金

1+阅读 · 2015年12月31日

方差正则化的分类模型选择方法研究

国家自然科学基金

1+阅读 · 2015年12月31日

基于信息熵和DCS的多基线SAR干涉理论与新方法研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于多任务学习的应用商店客户识别模型研究

国家自然科学基金

0+阅读 · 2013年12月31日

小样本空间制图

国家自然科学基金

0+阅读 · 2012年12月31日

S1P联合PR-MSCs移植在治疗小鼠急性心肌梗死中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

数据驱动的海洋平台故障预测方法研究

国家自然科学基金

0+阅读 · 2012年12月31日

负载扰动下燃料电池动态特性的分数阶建模与控制研究

国家自然科学基金

0+阅读 · 2012年12月31日

数据缺失时高维数据降维分析的方法、理论与应用

国家自然科学基金

1+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员