Accurate estimation of three-dimensional human skeletons from depth images can provide important metrics for healthcare applications, especially for biomechanical gait analysis. However, there exist inherent problems associated with depth images captured from a single view. The collected data is greatly affected by occlusions where only partial surface data can be recorded. Furthermore, depth images of human body exhibit heterogeneous characteristics with viewpoint changes, and the estimated poses under local coordinate systems are expected to go through equivariant rotations. Most existing pose estimation models are sensitive to both issues. To address this, we propose a novel approach for cross-view generalization with an occlusion-invariant semi-supervised learning framework built upon a novel rotation-equivariant backbone. Our model was trained with real-world data from a single view and unlabelled synthetic data from multiple views. It can generalize well on the real-world data from all the other unseen views. Our approach has shown superior performance on gait analysis on our ICL-Gait dataset compared to other state-of-the-arts and it can produce more convincing keypoints on ITOP dataset, than its provided "ground truth".
翻译:深度图像对三维人类骨骼的精确估计可以为医疗应用提供重要的衡量标准,特别是生物机械动作分析。然而,从一个角度拍摄的深度图像存在固有的问题。所收集的数据受到只有部分地表数据的分层的极大影响,只有部分地表数据可以记录。此外,人体的深度图像具有不同特点,视距变化,根据当地协调系统估计的外表组成预计将通过等离异的旋转进行。大多数现有构成的估算模型都对这两个问题都很敏感。为了解决这个问题,我们建议采用新颖的方法,在新颖的旋转-等离子主干骨架上建立跨视图的半超强的学习框架。我们的模式是用从一个单一的视角获得真实世界数据培训的,从多种观点中未加标签的合成数据。它能够从所有其他看不见的观点中广泛归纳真实世界数据。我们的方法显示,与其他状态相比,我们的ICL-Gait数据集的曲调分析优于其他状态,它可以在ITOP数据集上产生更令人信服的关键点,而不是它所提供的“地面”。