关于跨域行动强力行动识别的时间建模事项 (Recur, Attend or Convolve? On Whether Temporal Modeling Matters for Cross-Domain Robustness in Action Recognition)

Most action recognition models today are highly parameterized, and evaluated on datasets with predominantly spatially distinct classes. It has also been shown that 2D Convolutional Neural Networks (CNNs) tend to be biased toward texture rather than shape in still image recognition tasks. Taken together, this raises suspicion that large video models partly learn spurious correlations rather than to track relevant shapes over time to infer generalizable semantics from their movement. A natural way to avoid parameter explosion when learning visual patterns over time is to make use of recurrence. In this article, we empirically study whether the choice of low-level temporal modeling has consequences for texture bias and cross-domain robustness. In order to enable a light-weight and systematic assessment of the ability to capture temporal structure, not revealed from single frames, we provide the Temporal Shape (TS) dataset, as well as modified domains of Diving48 allowing for the investigation of texture bias for video models. We find that across a variety of model sizes, convolutional-recurrent and attention-based models show better out-of-domain robustness on TS than 3D CNNs. In domain shift experiments on Diving48, our experiments indicate that 3D CNNs and attention-based models exhibit more texture bias than convolutional-recurrent models. Moreover, qualitative examples suggest that convolutional-recurrent models learn more correct class attributes from the diving data when compared to the other two types of models at the same global validation performance.

翻译：今天,大多数行动识别模型都是高度参数化的,并且对以空间特征为主的类别数据集进行了评估。还显示,2D进化神经网络(CNNs)倾向于偏向于纹理,而不是在图像识别任务中形成形状。加在一起,这使人怀疑大型视频模型部分学习假相,而不是跟踪相关形状,以便从移动中推断出一般的语义学。在学习视觉模式时避免参数爆炸的自然方法是利用反复出现。在本篇文章中,我们实验研究选择低级时间模型是否对纹理偏差和跨部坚固性产生后果。为了能够对捕捉取时间结构的能力进行轻度和系统评估,而不是从单一框架显示,我们提供了Temoral 形状数据集,以及经过修改的Diving48域域,以便能够对视频模型的文字偏差进行调查。我们发现,在各种模型的大小、进化、经常和关注模型中,低级模型显示TS&TSA的坚固度,比3D级的直径直径性模型要好。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

95+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日