大规模视频动作识别模型的鲁棒性分析 (Large-scale Robustness Analysis of Video Action Recognition Models) - 专知论文

会员服务 ·

0

动作识别 · 鲁棒 · 识别模型 · 扰动 · 鲁棒性分析 ·

2023 年 4 月 7 日

Large-scale Robustness Analysis of Video Action Recognition Models

翻译：大规模视频动作识别模型的鲁棒性分析

Madeline Chantry Schiappa,Naman Biyani,Prudvi Kamtam,Shruti Vyas,Hamid Palangi,Vibhav Vineet,Yogesh Rawat

from arxiv, Accepted in 2023 Conference on Computer Vision and Pattern Recognition (CVPR)

We have seen a great progress in video action recognition in recent years. There are several models based on convolutional neural network (CNN) and some recent transformer based approaches which provide top performance on existing benchmarks. In this work, we perform a large-scale robustness analysis of these existing models for video action recognition. We focus on robustness against real-world distribution shift perturbations instead of adversarial perturbations. We propose four different benchmark datasets, HMDB51-P, UCF101-P, Kinetics400-P, and SSv2-P to perform this analysis. We study robustness of six state-of-the-art action recognition models against 90 different perturbations. The study reveals some interesting findings, 1) transformer based models are consistently more robust compared to CNN based models, 2) Pretraining improves robustness for Transformer based models more than CNN based models, and 3) All of the studied models are robust to temporal perturbations for all datasets but SSv2; suggesting the importance of temporal information for action recognition varies based on the dataset and activities. Next, we study the role of augmentations in model robustness and present a real-world dataset, UCF101-DS, which contains realistic distribution shifts, to further validate some of these findings. We believe this study will serve as a benchmark for future research in robust video action recognition.

翻译：我们在近年来视频动作识别方面取得了巨大的进展。基于卷积神经网络（CNN）和最近的基于transformer的方法提供了现有基准测试中的最佳表现。在这项工作中，我们对这些现有的视频动作识别模型进行大规模的鲁棒性分析。我们专注于对真实世界分布差异扰动而不是对抗性扰动的鲁棒性。我们提出了四个不同的基准数据集，分别是HMDB51-P，UCF101-P，Kinetics400-P和SSv2-P，以进行此分析。我们研究了六种最先进的动作识别模型对90种不同扰动的鲁棒性。该研究揭示了一些有趣的发现：1）与基于CNN的模型相比，基于transformer的模型始终更为鲁棒；2）预训练对transformer-based模型的鲁棒性的提高优于对CNN-based模型；3）针对所有数据集，但对于SSv2数据集而言，所研究的所有模型都对时间扰动具有鲁棒性；这表明动作识别的时序信息的重要性因数据集和活动而异。接下来，我们研究了增强技术在模型鲁棒性中的作用，并提出了一个真实世界数据集UCF101-DS，其中包含现实的分布变化，以进一步验证这些发现。我们相信本研究将成为未来鲁棒视频动作识别研究的基准。

1

相关内容

动作识别

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

专知会员服务

26+阅读 · 2020年7月24日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

近期必读的6篇CVPR 2020【域自适应（Domain Adaptation）】相关论文和代码

近期必读的6篇CVPR 2020【域自适应（Domain Adaptation）】相关论文和代码

专知会员服务

96+阅读 · 2020年3月24日

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

专知会员服务

17+阅读 · 2020年3月9日

【浙江大学-AAAI2020】领域自适应的对抗损失，Adversarial-Learned Loss for Domain Adaptation

【浙江大学-AAAI2020】领域自适应的对抗损失，Adversarial-Learned Loss for Domain Adaptation

专知会员服务

62+阅读 · 2020年1月11日

【斯坦福大学】具有共同注意力的对抗性跨域动作识别（Adversarial Cross-Domain Action Recognition with Co-Attention）

【斯坦福大学】具有共同注意力的对抗性跨域动作识别（Adversarial Cross-Domain Action Recognition with Co-Attention）

专知会员服务

38+阅读 · 2019年12月26日

【斯坦福大学】对抗性表征主动学习，Adversarial Representation Active Learning

【斯坦福大学】对抗性表征主动学习，Adversarial Representation Active Learning

专知会员服务

45+阅读 · 2019年12月20日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

KDD 2022 | 中科院计算所提出无监督高鲁棒性图结构学习框架—STABLE

KDD 2022 | 中科院计算所提出无监督高鲁棒性图结构学习框架—STABLE

PaperWeekly

0+阅读 · 2022年11月26日

CVPR 2021 论文盘点-人脸识别篇

CVPR 2021 论文盘点-人脸识别篇

CVer

2+阅读 · 2022年5月25日

【KDD2020-Tutorial】深度学习异常检测，180页ppt

【KDD2020-Tutorial】深度学习异常检测，180页ppt

专知

49+阅读 · 2020年8月28日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】基于图神经网络的情景识别

【泡泡一分钟】基于图神经网络的情景识别

泡泡机器人SLAM

11+阅读 · 2018年11月21日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

15+阅读 · 2018年2月13日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

基于图的半监督学习算法研究

国家自然科学基金

5+阅读 · 2015年12月31日

多任务学习的理论分析与应用

国家自然科学基金

6+阅读 · 2013年12月31日

复杂环境下面向人物图像检索的视觉属性提取研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于图理论的图像结构量化描述及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

存在恶意攻击的无线传感器网络目标定位与跟踪

国家自然科学基金

1+阅读 · 2012年12月31日

基于多义性码书学习和主题建模的图像语义分类技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

视频中交互行为的自动分析与理解

国家自然科学基金

1+阅读 · 2012年12月31日

基于Junction tree推理的多运动平台分散式协同导航算法研究

国家自然科学基金

2+阅读 · 2012年12月31日

说话人识别中时变鲁棒的声纹特征研究

国家自然科学基金

2+阅读 · 2012年12月31日

丛流形学习及其在物体识别中的应用

国家自然科学基金

0+阅读 · 2010年12月31日

On Evaluating Adversarial Robustness of Large Vision-Language Models

Arxiv

0+阅读 · 2023年5月26日

BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews

Arxiv

0+阅读 · 2023年5月26日

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Arxiv

0+阅读 · 2023年5月25日

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Arxiv

0+阅读 · 2023年5月25日

Impact of Adversarial Training on Robustness and Generalizability of Language Models

Arxiv

0+阅读 · 2023年5月25日

Deep Neural Networks in Video Human Action Recognition: A Review

Arxiv

0+阅读 · 2023年5月25日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Arxiv

25+阅读 · 2023年2月20日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

Feature Denoising for Improving Adversarial Robustness

Feature Denoising for Improving Adversarial Robustness

Arxiv

15+阅读 · 2018年12月9日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

VIP会员

文章信息

相关主题

鲁棒性分析

相关VIP内容

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

【斯坦福CVPR2022】EG3D:高效的几何感知三维生成对抗网络，EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks

专知会员服务

18+阅读 · 2022年3月15日

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

专知会员服务

26+阅读 · 2020年7月24日

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

【微软】大型神经语言模型的对抗性训练，Adversarial Training for Large Neural Language Models

专知会员服务

51+阅读 · 2020年5月3日

近期必读的6篇CVPR 2020【域自适应（Domain Adaptation）】相关论文和代码

近期必读的6篇CVPR 2020【域自适应（Domain Adaptation）】相关论文和代码

专知会员服务

96+阅读 · 2020年3月24日

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

【DeepMind】基于变换的大规模数据对抗视频预测，Transformation-based Adversarial Video Prediction on Large-Scale Data

专知会员服务

17+阅读 · 2020年3月9日

【浙江大学-AAAI2020】领域自适应的对抗损失，Adversarial-Learned Loss for Domain Adaptation

【浙江大学-AAAI2020】领域自适应的对抗损失，Adversarial-Learned Loss for Domain Adaptation

专知会员服务

62+阅读 · 2020年1月11日

【斯坦福大学】具有共同注意力的对抗性跨域动作识别（Adversarial Cross-Domain Action Recognition with Co-Attention）

【斯坦福大学】具有共同注意力的对抗性跨域动作识别（Adversarial Cross-Domain Action Recognition with Co-Attention）

专知会员服务

38+阅读 · 2019年12月26日

【斯坦福大学】对抗性表征主动学习，Adversarial Representation Active Learning

【斯坦福大学】对抗性表征主动学习，Adversarial Representation Active Learning

专知会员服务

45+阅读 · 2019年12月20日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《乌克兰无人机产业：志愿者与政策在构建新兴无人机产业中的协同作用》最新报告

《人工智能辅助决策中的数据可视化：系统性综述》

人工智能驱动弹药制造现代化：美国陆军转型之路

《敏捷作战部署中枢纽-辐条基地选址优化研究》80页

相关资讯

KDD 2022 | 中科院计算所提出无监督高鲁棒性图结构学习框架—STABLE

KDD 2022 | 中科院计算所提出无监督高鲁棒性图结构学习框架—STABLE

PaperWeekly

0+阅读 · 2022年11月26日

CVPR 2021 论文盘点-人脸识别篇

CVPR 2021 论文盘点-人脸识别篇

CVer

2+阅读 · 2022年5月25日

【KDD2020-Tutorial】深度学习异常检测，180页ppt

【KDD2020-Tutorial】深度学习异常检测，180页ppt

专知

49+阅读 · 2020年8月28日

简评 | Video Action Recognition 的近期进展

简评 | Video Action Recognition 的近期进展

极市平台

20+阅读 · 2019年4月21日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

【泡泡一分钟】基于图神经网络的情景识别

【泡泡一分钟】基于图神经网络的情景识别

泡泡机器人SLAM

11+阅读 · 2018年11月21日

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

【论文推荐】最新7篇条件随机场（CRF）相关论文—图像标注、对抗学习、端到端、注意力机制、三维人体姿态、图像分割、行为分割和识别

专知

15+阅读 · 2018年2月13日

【论文】变分推断（Variational inference)的总结

【论文】变分推断（Variational inference)的总结

机器学习研究会

39+阅读 · 2017年11月16日

【论文】图上的表示学习综述

【论文】图上的表示学习综述

机器学习研究会

15+阅读 · 2017年9月24日

相关论文

On Evaluating Adversarial Robustness of Large Vision-Language Models

Arxiv

0+阅读 · 2023年5月26日

BanglaBook: A Large-scale Bangla Dataset for Sentiment Analysis from Book Reviews

Arxiv

0+阅读 · 2023年5月26日

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Improving Zero-shot Generalization and Robustness of Multi-modal Models

Arxiv

0+阅读 · 2023年5月25日

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Arxiv

0+阅读 · 2023年5月25日

Impact of Adversarial Training on Robustness and Generalizability of Language Models

Arxiv

0+阅读 · 2023年5月25日

Deep Neural Networks in Video Human Action Recognition: A Review

Arxiv

0+阅读 · 2023年5月25日

Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey

Arxiv

25+阅读 · 2023年2月20日

A Survey on Vision Transformer

Arxiv

17+阅读 · 2022年2月23日

Feature Denoising for Improving Adversarial Robustness

Feature Denoising for Improving Adversarial Robustness

Arxiv

15+阅读 · 2018年12月9日

CommanderSong: A Systematic Approach for Practical Adversarial Voice Recognition

Arxiv

14+阅读 · 2018年1月24日

相关基金

基于图的半监督学习算法研究

国家自然科学基金

5+阅读 · 2015年12月31日

多任务学习的理论分析与应用

国家自然科学基金

6+阅读 · 2013年12月31日

复杂环境下面向人物图像检索的视觉属性提取研究

国家自然科学基金

1+阅读 · 2013年12月31日

基于图理论的图像结构量化描述及应用研究

国家自然科学基金

0+阅读 · 2013年12月31日

存在恶意攻击的无线传感器网络目标定位与跟踪

国家自然科学基金

1+阅读 · 2012年12月31日

基于多义性码书学习和主题建模的图像语义分类技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

视频中交互行为的自动分析与理解

国家自然科学基金

1+阅读 · 2012年12月31日

基于Junction tree推理的多运动平台分散式协同导航算法研究

国家自然科学基金

2+阅读 · 2012年12月31日

说话人识别中时变鲁棒的声纹特征研究

国家自然科学基金

2+阅读 · 2012年12月31日

丛流形学习及其在物体识别中的应用

国家自然科学基金

0+阅读 · 2010年12月31日

微信扫码咨询专知VIP会员