This paper addresses the problem of 3D human body shape and pose estimation from an RGB image. This is often an ill-posed problem, since multiple plausible 3D bodies may match the visual evidence present in the input - particularly when the subject is occluded. Thus, it is desirable to estimate a distribution over 3D body shape and pose conditioned on the input image instead of a single 3D reconstruction. We train a deep neural network to estimate a hierarchical matrix-Fisher distribution over relative 3D joint rotation matrices (i.e. body pose), which exploits the human body's kinematic tree structure, as well as a Gaussian distribution over SMPL body shape parameters. To further ensure that the predicted shape and pose distributions match the visual evidence in the input image, we implement a differentiable rejection sampler to impose a reprojection loss between ground-truth 2D joint coordinates and samples from the predicted distributions, projected onto the image plane. We show that our method is competitive with the state-of-the-art in terms of 3D shape and pose metrics on the SSP-3D and 3DPW datasets, while also yielding a structured probability distribution over 3D body shape and pose, with which we can meaningfully quantify prediction uncertainty and sample multiple plausible 3D reconstructions to explain a given input image. Code is available at https://github.com/akashsengupta1997/HierarchicalProbabilistic3DHuman .
翻译:本文处理 3D 人体形状的问题, 并用 RGB 图像做出估计。 这经常是一个错误的问题, 因为多个可信的 3D 身体可能匹配输入中的视觉证据 - 特别是当该对象被隐蔽时。 因此, 有必要估计 3D 身体形状的分布, 并设定以输入图像为条件, 而不是以单一 3D 重建为条件。 我们训练一个深层神经网络, 以估计3D 相相对的 3D 联合旋转矩阵( 身体构成) 的等级矩阵- Fisher 分布。 我们显示, 我们的方法与3D 结构中的人体树结构以及 SMP-3L 形状参数的高斯分布相匹配。 为了进一步确保预测的形状和形状分布和形状与输入图像图像中的视觉证据匹配。 我们使用的方法与3SSP-3D 和 3W 图像的状态相比具有竞争力, 我们也可以在 3SSP-3D 和 图像的图像结构中做出真实性分析。