Contrastive learning has been proven beneficial for self-supervised skeleton-based action recognition. Most contrastive learning methods utilize carefully designed augmentations to generate different movement patterns of skeletons for the same semantics. However, it is still a pending issue to apply strong augmentations, which distort the images/skeletons' structures and cause semantic loss, due to their resulting unstable training. In this paper, we investigate the potential of adopting strong augmentations and propose a general hierarchical consistent contrastive learning framework (HiCLR) for skeleton-based action recognition. Specifically, we first design a gradual growing augmentation policy to generate multiple ordered positive pairs, which guide to achieve the consistency of the learned representation from different views. Then, an asymmetric loss is proposed to enforce the hierarchical consistency via a directional clustering operation in the feature space, pulling the representations from strongly augmented views closer to those from weakly augmented views for better generalizability. Meanwhile, we propose and evaluate three kinds of strong augmentations for 3D skeletons to demonstrate the effectiveness of our method. Extensive experiments show that HiCLR outperforms the state-of-the-art methods notably on three large-scale datasets, i.e., NTU60, NTU120, and PKUMMD.
翻译:事实证明,反向学习有助于自我监督的骨骼行动识别。大多数对比式学习方法都使用精心设计的增强方法,为同一语义学创造不同的骨骼运动模式;然而,采用强大的增强方法,扭曲图像/斯凯顿结构并造成语义损失,因为其导致的培训不稳定。在本文件中,我们调查采用强力增强的潜力,并提议一个总体等级一致的对比学习框架(HICLR),以证实基于骨骼的行动识别。具体地说,我们首先设计一个逐步增长的增强政策,以产生多个订购的正对,从而指导不同观点所学到的代表性的一致性。然后,提出不对称的损失,通过在地貌空间进行定向集成作业,将强力增强的观点与那些从弱力增强的观点拉近,以更好地普及性。与此同时,我们提议并评价3D骨架的三种强增强力增强能力框架(HICLR),以证明我们的方法的有效性。广泛的实验表明,HICLR超越了N-60,尤其是N-TU,ATU。