SVMA: 以GAN为基础的单形三维人体豆类估计模型 (SVMA: A GAN-based model for Monocular 3D Human Pose Estimation)

Recovering 3D human pose from 2D joints is a highly unconstrained problem, especially without any video or multi-view information. We present an unsupervised GAN-based model to recover 3D human pose from 2D joint locations extracted from a single image. Our model uses a GAN to learn the mapping of distribution from 2D poses to 3D poses, not the simple 2D-3D correspondence. Considering the reprojection constraint, our model can estimate the camera so that we can reproject the estimated 3D pose to the original 2D pose. Based on this reprojection method, we can rotate and reproject the generated pose to get our "new" 2D pose and then use a weight sharing generator to estimate the "new" 3D pose and a "new" camera. Through the above estimation process, we can define the single-view-multi-angle consistency loss during training to simulate multi-view consistency, which means the 3D poses and cameras estimated from two angles of a single view should be able to be mixed to generate rich 2D reprojections, and the 2D reprojections reprojected from the same 3D pose should be consistent. The experimental results on Human3.6M show that our method outperforms all the state-of-the-art methods, and results on MPI-INF-3DHP show that our method outperforms state-of-the-art by approximately 15.0%.

翻译：从 2D 关节中回收 3D 人姿势是一个高度不受限制的问题, 特别是没有视频或多视图信息。我们展示了一个无人监督的 GAN 模型, 以从从一个图像中提取的 2D 联合地点恢复 3D 人姿势。我们的模型使用 GAN 来学习2D 向 3D 的分布映射图, 而不是简单的 2D-3D 通信。考虑到再预测的限制, 我们的模型可以估计相机, 这样我们可以重新预测 15D 向原始 2D 姿势的估计 3D 。基于这一再预测方法, 我们可以轮换和重新预测产生的姿势, 以获得我们的“ 新” 2D 2D 的姿势, 然后使用一个重力共享生成器来估算“ 3D 3D 姿势” 和“ 新” 相机。通过上述估计过程, 我们可以定义在模拟多视一致性培训期间的单视多角度多角度一致性损失。这意味着, 从一个视图的两个角度中估计的 3D 3D 和相机应该可以混合起来产生丰富的 2D 重新预测。根据这个方法, 我们的 2D 重新预测产生“ 2D 2D 2D 2D 2D 3D 和 2D 重新规划重新展示了我们的3D 3D 3D 方法显示所有3D 3D 3D 方法的的显示的的 3D 3D 3D 3D 显示的的的 3D 3D 方法显示的的和的的的 3DF 显示的 3D 和 RD 显示所有的的 3D 的的 3DF 3. 。