Estimating 3D human pose and shape from a single image is highly under-constrained. To address this ambiguity, we propose a novel prior, namely kinematic dictionary, which explicitly regularizes the solution space of relative 3D rotations of human joints in the kinematic tree. Integrated with a statistical human model and a deep neural network, our method achieves end-to-end 3D reconstruction without the need of using any shape annotations during the training of neural networks. The kinematic dictionary bridges the gap between in-the-wild images and 3D datasets, and thus facilitates end-to-end training across all types of datasets. The proposed method achieves competitive results on large-scale datasets including Human3.6M, MPI-INF-3DHP, and LSP, while running in real-time given the human bounding boxes.
翻译:以单一图像来估计3D人类的外形和形状是高度受限制的。 为了解决这一模糊问题,我们提议了一个新的前题,即动画字典,它明确规范了运动树中人关节相对3D旋转的解决方案空间。与人类统计模型和深神经网络相结合,我们的方法实现了端至端3D重建,无需在神经网络培训中使用任何形状说明。动画字典弥合了电动图象和3D数据集之间的差距,从而便利了所有类型数据集的端至端培训。拟议方法在包括人文3.6M、MPI-INF-3DHP和LSP在内的大型数据集上取得了竞争性结果,同时实时运行给人文捆绑框。