We present TIPSy-GAN, a new approach to improve the accuracy and stability in unsupervised adversarial 2D to 3D human pose estimation. In our work we demonstrate that the human kinematic skeleton should not be assumed as one spatially codependent structure. In fact, we believe when a full 2D pose is provided during training, there is an inherent bias learned where the 3D coordinate of a keypoint is spatially codependent on the 2D locations of all other keypoints. To investigate our theory we follow previous adversarial approaches but train two generators on spatially independent parts of the kinematic skeleton, the torso and the legs. We find that improving the 2D reprojection self-consistency cycle is key to lowering the evaluation error and therefore introduce new consistency constraints during training. A TIPSy is produced model via knowledge distillation from these generators which can predict the 3D coordinates for the entire 2D pose with improved results. Furthermore, we address the question left unanswered in prior work detailing how long to train for a truly unsupervised scenario. We show that two independent generators training adversarially has improved stability than that of a solo generator which will collapse due to the adversarial network becoming unstable. TIPSy decreases the average error by 18% when compared to that of a baseline solo generator. TIPSy improves upon other unsupervised approaches while also performing strongly against supervised and weakly-supervised approaches during evaluation on both the Human3.6M and MPI-INF-3DHP dataset.
翻译:我们提出了TIPSy-GAN, 这是一种提高未经监督的对角2D至3D人构成估计的准确性和稳定性的新办法。 我们在工作中表明, 人类运动骨骼不应该被假定为一个空间上共同依赖的结构。 事实上, 我们相信, 当培训期间提供完整的 2D 构成时, 关键点的3D协调在空间上依赖于所有其他关键点的2D位置, 就会产生内在的偏差。 为了调查我们的理论, 我们遵循了先前的对冲方法, 而在运动骨骼的空间独立部分、 托尔索和腿上培训了两台发电机。 我们发现, 改进 2D再预测自我一致性的循环是降低评价错误的关键, 从而在培训期间引入新的一致性限制。 TIPS 是通过这些发电机的知识蒸馏而生成的模型, 该模型可以预测整个2DF的3坐标, 其结果会得到改善。 此外, 我们在先前的工作中未解答问题, 详细说明为真正不受控制的情景而进行培训的时间有多长。 我们发现, 改进 2DPS- 3 (TER) 的两种独立的发电机培训在比18 标准网络更加稳定, 之后, 将持续地进行稳定的 。