Learning a good 3D human pose representation is important for human pose related tasks, e.g. human 3D pose estimation and action recognition. Within all these problems, preserving the intrinsic pose information and adapting to view variations are two critical issues. In this work, we propose a novel Siamese denoising autoencoder to learn a 3D pose representation by disentangling the pose-dependent and view-dependent feature from the human skeleton data, in a fully unsupervised manner. These two disentangled features are utilized together as the representation of the 3D pose. To consider both the kinematic and geometric dependencies, a sequential bidirectional recursive network (SeBiReNet) is further proposed to model the human skeleton data. Extensive experiments demonstrate that the learned representation 1) preserves the intrinsic information of human pose, 2) shows good transferability across datasets and tasks. Notably, our approach achieves state-of-the-art performance on two inherently different tasks: pose denoising and unsupervised action recognition. Code and models are available at: \url{https://github.com/NIEQiang001/unsupervised-human-pose.git}
翻译:在所有这些问题中,保护内在构成的信息,并适应各种变异是两个关键问题。在这项工作中,我们提议建立一个新型的暹粒脱色自动编码器,以完全不受监督的方式,将成形和视象的特征与人体骨骼数据脱钩,从而学习成三维代表物。这两个不相干的特点作为三维代表物的体现物一起使用。为了同时考虑运动和几何依赖性,还提议建立一个相继的双向循环网络(SeBiReNet),以模拟人类骨骼数据。广泛的实验表明,所学的表述物1)保存人类骨骼的固有信息,2)显示在数据和任务之间可以很好地转移。值得注意的是,我们的方法在两个固有的不同的任务上实现了状态的艺术表现:构成脱色和未超异的行动识别。代码和模型见:urlas-as-giusub./engiusima.Qiasionasion。