Skeleton-based human action recognition has attracted increasing attention in recent years. However, most of the existing works focus on supervised learning which requiring a large number of annotated action sequences that are often expensive to collect. We investigate unsupervised representation learning for skeleton action recognition, and design a novel skeleton cloud colorization technique that is capable of learning skeleton representations from unlabeled skeleton sequence data. Specifically, we represent a skeleton action sequence as a 3D skeleton cloud and colorize each point in the cloud according to its temporal and spatial orders in the original (unannotated) skeleton sequence. Leveraging the colorized skeleton point cloud, we design an auto-encoder framework that can learn spatial-temporal features from the artificial color labels of skeleton joints effectively. We evaluate our skeleton cloud colorization approach with action classifiers trained under different configurations, including unsupervised, semi-supervised and fully-supervised settings. Extensive experiments on NTU RGB+D and NW-UCLA datasets show that the proposed method outperforms existing unsupervised and semi-supervised 3D action recognition methods by large margins, and it achieves competitive performance in supervised 3D action recognition as well.
翻译:近年来,基于皮肤的人类行动认知日益引起人们的关注。然而,大多数现有作品侧重于监督学习,这需要大量往往昂贵的附加说明的动作序列收集。我们调查了未经监督的用于骨骼动作识别的代言学习,并设计了一种新的骨骼云色化技术,能够从未贴标签的骨骼序列数据中学习骨骼表层表征。具体地说,我们代表了一个骨骼动作序列,作为3D骨骼云,并根据原始(未加注解的)骨骼序列中的时间和空间顺序将云层中的每个点进行色化。利用彩色骨骼点云,我们设计了一个自动编码框架,能够有效地从骨骼联合的人工颜色标签中学习空间时空特征。我们用在不同配置下培训的行动分类师对骨骼颜色化方法进行评估,包括未经监督、半监督和完全监控的环境。关于NTUTU RGB+D和NW-UCLA数据集的广泛实验表明,拟议方法超越了现有未经控制和半监督的骨质骨质骨质点云和半受监督的3D行动识别方法。我们通过大利润监督的动作实现了竞争性的3D行动识别方法。