This paper targets unsupervised skeleton-based action representation learning and proposes a new Hierarchical Contrast (HiCo) framework. Different from the existing contrastive-based solutions that typically represent an input skeleton sequence into instance-level features and perform contrast holistically, our proposed HiCo represents the input into multiple-level features and performs contrast in a hierarchical manner. Specifically, given a human skeleton sequence, we represent it into multiple feature vectors of different granularities from both temporal and spatial domains via sequence-to-sequence (S2S) encoders and unified downsampling modules. Besides, the hierarchical contrast is conducted in terms of four levels: instance level, domain level, clip level, and part level. Moreover, HiCo is orthogonal to the S2S encoder, which allows us to flexibly embrace state-of-the-art S2S encoders. Extensive experiments on four datasets, i.e., NTU-60, NTU-120, PKU-MMD I and II, show that HiCo achieves a new state-of-the-art for unsupervised skeleton-based action representation learning in two downstream tasks including action recognition and retrieval, and its learned action representation is of good transferability. Besides, we also show that our framework is effective for semi-supervised skeleton-based action recognition. Our code is available at https://github.com/HuiGuanLab/HiCo.
翻译:本文针对的是未经监督的基于骨架的行动代表学习,并提出了一个新的等级对比(HiCo)框架。与现有的基于对比性的解决办法不同,这些办法通常代表向试级特征输入骨架序列,并整体地进行对比,我们提议的HiCo代表了对多级特征的输入,并以等级方式进行对比。具体地说,鉴于人类骨架序列,我们通过顺序对顺序对顺序(S2S)编码和统一的下游抽样模块,将它代表成来自时间和空间领域不同微粒体的多重特性矢量体。此外,等级对比是以四个级别进行的:例级、域级、剪级和部分级别。此外,HiCo是S2S编码的任意输入,这使我们能够灵活地接受S2S编码的状态。关于四个数据集(即NTU-60、NTU-120、PKU-MMD I和II)的广泛实验,表明HCoo 实现新的州级结构对比,这是我们所了解的下游核心代表,也是我们所了解的可操作的行动框架。