In this work, we propose a Cross-view Contrastive Learning framework for unsupervised 3D skeleton-based action Representation (CrosSCLR), by leveraging multi-view complementary supervision signal. CrosSCLR consists of both single-view contrastive learning (SkeletonCLR) and cross-view consistent knowledge mining (CVC-KM) modules, integrated in a collaborative learning manner. It is noted that CVC-KM works in such a way that high-confidence positive/negative samples and their distributions are exchanged among views according to their embedding similarity, ensuring cross-view consistency in terms of contrastive context, i.e., similar distributions. Extensive experiments show that CrosSCLR achieves remarkable action recognition results on NTU-60 and NTU-120 datasets under unsupervised settings, with observed higher-quality action representations. Our code is available at https://github.com/LinguoLi/CrosSCLR.
翻译:在这项工作中,我们提议利用多视角互补监督信号,为不受监督的3D基于骨骼的行动代表(CrosSCLR)提供一个交叉对比学习框架(CrosSCLR),CrosSCLR由单一视角对比学习(SkeletonCLR)模块和交叉视角一致知识采矿(CVC-KM)模块组成,以协作学习的方式整合;注意到CVC-KM工作的方式是,根据高信任正反面样本的嵌入相似性在各种观点之间交换其分布,确保对比环境(即类似分布)的交叉观点一致性。广泛的实验显示,CrosSCLR在不受监督的环境中,在NTU-60和NTU-120数据集上取得了显著的行动识别结果,并有高品质的表现。我们的代码可在https://github.com/LinguoLi/CrosSCLLR查阅。