In this work, we study self-supervised representation learning for 3D skeleton-based action recognition. We extend Bootstrap Your Own Latent (BYOL) for representation learning on skeleton sequence data and propose a new data augmentation strategy including two asymmetric transformation pipelines. We also introduce a multi-viewpoint sampling method that leverages multiple viewing angles of the same action captured by different cameras. In the semi-supervised setting, we show that the performance can be further improved by knowledge distillation from wider networks, leveraging once more the unlabeled samples. We conduct extensive experiments on the NTU-60 and NTU-120 datasets to demonstrate the performance of our proposed method. Our method consistently outperforms the current state of the art on both linear evaluation and semi-supervised benchmarks.
翻译:在这项工作中,我们研究了3D基于骨骼的行动识别自我监督的代言学习。我们扩展了“自控陷阱”系统(BYOL),用于在骨架序列数据上进行代言学习,并提出新的数据增强战略,包括两个非对称转换管道。我们还采用了多视角抽样方法,利用不同相机所捕捉到的相同动作的多重视角。在半监督环境中,我们显示,通过从更广泛的网络中提取知识,再次利用未标标样本,可以进一步提高业绩。我们在NTU-60和NTU-120数据集上进行了广泛的实验,以展示我们拟议方法的绩效。我们的方法始终超越了当前线性评价和半监督基准方面的艺术状态。