Action recognition via 3D skeleton data is an emerging important topic in these years. Most existing methods either extract hand-crafted descriptors or learn action representations by supervised learning paradigms that require massive labeled data. In this paper, we for the first time propose a contrastive action learning paradigm named AS-CAL that can leverage different augmentations of unlabeled skeleton data to learn action representations in an unsupervised manner. Specifically, we first propose to contrast similarity between augmented instances (query and key) of the input skeleton sequence, which are transformed by multiple novel augmentation strategies, to learn inherent action patterns ("pattern-invariance") of different skeleton transformations. Second, to encourage learning the pattern-invariance with more consistent action representations, we propose a momentum LSTM, which is implemented as the momentum-based moving average of LSTM based query encoder, to encode long-term action dynamics of the key sequence. Third, we introduce a queue to store the encoded keys, which allows our model to flexibly reuse proceeding keys and build a more consistent dictionary to improve contrastive learning. Last, by temporally averaging the hidden states of action learned by the query encoder, a novel representation named Contrastive Action Encoding (CAE) is proposed to represent human's action effectively. Extensive experiments show that our approach typically improves existing hand-crafted methods by 10-50% top-1 accuracy, and it can achieve comparable or even superior performance to numerous supervised learning methods.
翻译:通过 3D 骨架数据 识别 3D 骨架数据 的行动是这些年中新出现的一个重要主题。 大多数现有方法要么是提取手工制作的描述符,要么通过需要大量标签数据的受监督学习模式来学习行动表现。 在本文件中,我们首次提议了一个反动的行动学习模式,名为 AS-CAL, 它可以利用未贴标签的骨架数据的不同增强功能, 以不受监督的方式学习动作表现。 具体地说, 我们首先提议将输入骨架序列的强化实例( 询问和关键) 之间的相似性加以对比, 后者由多个新颖的增强战略转换, 学习不同骨架转换的内在行动模式( “ 类型偏差 ” ) 。 其次, 为了鼓励以更一致的动作表现来学习模式( “ 模式” 模式” ), 我们提议了一种动态的LSTMTM, 作为基于 LSTM 查询的动力移动平均数, 来记录关键序列中的长期行动动态。 第三, 我们提出一个队列, 储存编程键, 使我们的模型可以灵活再利用键, 并构建一个更加一致的字典, 来改进对比学习 对比学习。 最后, 通过时间化的缩略取的人类动作的动作的动作, 以显示的动作, 以显示的缩略式的动作的动作的动作的动作的动作, 通过学习方式显示的动作的动作的动作的动作, 的动作的动作的动作, 通过学习方式, 学习。