Self-supervised learning (SSL), which aims to learn meaningful prior representations from unlabeled data, has been proven effective for skeleton-based action understanding. Different from the image domain, skeleton data possesses sparser spatial structures and diverse representation forms, with the absence of background clues and the additional temporal dimension, presenting new challenges for spatial-temporal motion pretext task design. Recently, many endeavors have been made for skeleton-based SSL, achieving remarkable progress. However, a systematic and thorough review is still lacking. In this paper, we conduct, for the first time, a comprehensive survey on self-supervised skeleton-based action representation learning. Following the taxonomy of context-based, generative learning, and contrastive learning approaches, we make a thorough review and benchmark of existing works and shed light on the future possible directions. Remarkably, our investigation demonstrates that most SSL works rely on the single paradigm, learning representations of a single level, and are evaluated on the action recognition task solely, which leaves the generalization power of skeleton SSL models under-explored. To this end, a novel and effective SSL method for skeleton is further proposed, which integrates versatile representation learning objectives of different granularity, substantially boosting the generalization capacity for multiple skeleton downstream tasks. Extensive experiments under three large-scale datasets demonstrate our method achieves superior generalization performance on various downstream tasks, including recognition, retrieval, detection, and few-shot learning.
翻译:自监督学习(SSL)旨在从无标注数据中学习有意义的先验表征,已被证明对基于骨架的动作理解具有显著效果。与图像领域不同,骨架数据具有更稀疏的空间结构和多样化的表征形式,同时缺乏背景线索并增加了时间维度,这为时空运动前置任务的设计带来了新的挑战。近年来,针对骨架数据的自监督学习已取得诸多进展,成果显著。然而,目前仍缺乏系统而全面的综述。本文首次对基于骨架的自监督动作表征学习进行了全面调研。依据基于上下文、生成式学习与对比学习的方法分类,我们对现有工作进行了深入评述与基准测试,并展望了未来可能的发展方向。值得注意的是,我们的研究表明,多数自监督学习方法依赖于单一范式、学习单一层次表征,且仅在动作识别任务上进行评估,这使得骨架自监督学习模型的泛化能力尚未得到充分探索。为此,本文进一步提出了一种新颖且有效的骨架自监督学习方法,该方法整合了不同粒度的多样化表征学习目标,显著提升了模型在多个骨架下游任务上的泛化能力。在三个大规模数据集上的大量实验表明,我们的方法在多种下游任务(包括识别、检索、检测与小样本学习)上均取得了卓越的泛化性能。