Learning to regress 3D human body shape and pose (e.g.~SMPL parameters) from monocular images typically exploits losses on 2D keypoints, silhouettes, and/or part-segmentation when 3D training data is not available. Such losses, however, are limited because 2D keypoints do not supervise body shape and segmentations of people in clothing do not match projected minimally-clothed SMPL shapes. To exploit richer image information about clothed people, we introduce higher-level semantic information about clothing to penalize clothed and non-clothed regions of the image differently. To do so, we train a body regressor using a novel Differentiable Semantic Rendering - DSR loss. For Minimally-Clothed regions, we define the DSR-MC loss, which encourages a tight match between a rendered SMPL body and the minimally-clothed regions of the image. For clothed regions, we define the DSR-C loss to encourage the rendered SMPL body to be inside the clothing mask. To ensure end-to-end differentiable training, we learn a semantic clothing prior for SMPL vertices from thousands of clothed human scans. We perform extensive qualitative and quantitative experiments to evaluate the role of clothing semantics on the accuracy of 3D human pose and shape estimation. We outperform all previous state-of-the-art methods on 3DPW and Human3.6M and obtain on par results on MPI-INF-3DHP. Code and trained models are available for research at https://dsr.is.tue.mpg.de/.
翻译:学习后退 3D 人体形状和姿势( 例如 ~ SMPL 参数 ), 从单色图像中学习 3D 人体形状和姿势( 比如 ~ SMPL 参数 ), 单色图像通常会利用 2D 关键点、 双光片和/ 或部分分层的损失 。 然而, 此类损失是有限的, 因为 2D 关键点不监督 3D 培训数据 。 但是, 2D 关键点并不监督身着服装的人的形状和分层与所预测的最低穿衣服的 SMPL 形状不匹配。 为了利用有衣的人的更丰富图像信息, 我们引入了更高层次的关于服装的语调信息信息, 以不同的方式惩罚有衣着和无衣着的区。 要这样做, 我们用新的 SMPL 3 代码来训练一个机构递增的 。 要在之前的SMDRML 上, 做一个高级的文质化的文体 。