与 " 行动承认 " 行动时间端端对转移的 " 空间自控模式 " 的 " 空间自控模式 " (Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition)

Transformer-based methods have recently achieved great advancement on 2D image-based vision tasks. For 3D video-based tasks such as action recognition, however, directly applying spatiotemporal transformers on video data will bring heavy computation and memory burdens due to the largely increased number of patches and the quadratic complexity of self-attention computation. How to efficiently and effectively model the 3D self-attention of video data has been a great challenge for transformers. In this paper, we propose a Temporal Patch Shift (TPS) method for efficient 3D self-attention modeling in transformers for video-based action recognition. TPS shifts part of patches with a specific mosaic pattern in the temporal dimension, thus converting a vanilla spatial self-attention operation to a spatiotemporal one with little additional cost. As a result, we can compute 3D self-attention using nearly the same computation and memory cost as 2D self-attention. TPS is a plug-and-play module and can be inserted into existing 2D transformer models to enhance spatiotemporal feature learning. The proposed method achieves competitive performance with state-of-the-arts on Something-something V1 & V2, Diving-48, and Kinetics400 while being much more efficient on computation and memory cost. The source code of TPS can be found at https://github.com/MartinXM/TPS.

翻译：在基于 2D 图像的视觉任务方面,基于变异器的方法最近取得了巨大的进步。对于基于 3D 的视频任务,例如行动识别等,直接在视频数据上应用波片时变压器将带来沉重的计算和记忆负担,因为修补和自留计算方式的四边复杂程度大大增加。如何高效和有效地建模3D 自留视频数据对变压器来说是一个巨大的挑战。在本文中,我们提议在视频行动识别的变压器中,为高效的 3D 自留模型采用Temal Patch Shift (TPS) 方法。在时间尺度上,直接应用带有特定马赛模式的补丁部分,从而将香草空间自留操作转换成一个随机多时空操作器。结果是,我们可以使用与 2D 自留调几乎相同的计算和记忆成本。TPS 是一个插和游戏模块,可以插入现有的 2D 变换模型中,用特定马赛模式在时间尺度上进行学习。提议的V-48 智能计算方法在V- hestal 和V- hestals 上可以找到更多的Sy- sex- ex- salpreal2 。

相关内容

MoDELS

关注 43

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

ICLR 2021杰出论文奖出炉，8篇论文上榜！

专知会员服务

26+阅读 · 2021年4月2日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日