3D human body reconstruction from monocular images is an interesting and ill-posed problem in computer vision with wider applications in multiple domains. In this paper, we propose SHARP, a novel end-to-end trainable network that accurately recovers the detailed geometry and appearance of 3D people in loose clothing from a monocular image. We propose a sparse and efficient fusion of a parametric body prior with a non-parametric peeled depth map representation of clothed models. The parametric body prior constraints our model in two ways: first, the network retains geometrically consistent body parts that are not occluded by clothing, and second, it provides a body shape context that improves prediction of the peeled depth maps. This enables SHARP to recover fine-grained 3D geometrical details with just L1 losses on the 2D maps, given an input image. We evaluate SHARP on publicly available Cloth3D and THuman datasets and report superior performance to state-of-the-art approaches.
翻译:3D人体用单体图像进行3D人体重建是计算机视觉中一个有趣的、错误的问题,在多个领域应用范围更广。在本文中,我们建议SHARP是一个新型的端到端可训练网络,精确地从单体图像中恢复衣着松散的3D人的详细几何和外观。我们建议先将一个参数体进行稀疏和高效的融合,然后对布衣模型进行非对称剥皮深度地图代表。参数体先于我们的模式有两种限制:第一,网络保留着不为衣服所覆盖的几何相容的物体部分,第二,它提供了改善对深层地图预测的形状环境。这使得SHARP能够恢复精细的3D几何方面细节,仅对2D地图的L1损失进行输入图像。我们评估了公开提供的Cloth3D和THuman数据集,并向最新方法报告优异性表现。