We propose the Motion Capsule Autoencoder (MCAE), which addresses a key challenge in the unsupervised learning of motion representations: transformation invariance. MCAE models motion in a two-level hierarchy. In the lower level, a spatio-temporal motion signal is divided into short, local, and semantic-agnostic snippets. In the higher level, the snippets are aggregated to form full-length semantic-aware segments. For both levels, we represent motion with a set of learned transformation invariant templates and the corresponding geometric transformations by using capsule autoencoders of a novel design. This leads to a robust and efficient encoding of viewpoint changes. MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets. Notably, it achieves better results than baselines on Trajectory20 with considerably fewer parameters and state-of-the-art performance on the unsupervised skeleton-based action recognition task.
翻译:我们建议使用运动模拟(MCAE)运动动作模型(MCAE),它解决了在不受监督的动作演示学习中的一项关键挑战:变换。 MCAE模型在两个层次上运动。 在较低层次上, spatio- 时空运动信号分为短、 局部和语义- 语义化片片段。 在较高层次上, 片段被聚合成全长的语义识别部分。 在这两个层次上, 我们代表着一套通过使用新设计的胶囊自动编码器进行学习的变异样和相应的几何转换的运动。 这导致对观点变化进行强有力和高效的编码。 MCAE 是在一部新型Traphectory20运动数据集和各种现实世界骨架人类动作数据集上进行评估的。 值得注意的是, 它比轨迹20 基线的结果要好, 其参数和状态表现都大大低于不受监督的骨骼行动识别任务的参数和状态。