We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image. We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end, and learn to reconstruct its pose and shape state in a self-supervised regime. Central to our methodology, is a learning to learn and optimize approach, referred to as HUmanNeural Descent (HUND), which avoids both second-order differentiation when training the model parameters,and expensive state gradient descent in order to accurately minimize a semantic differentiable rendering loss at test time. Instead, we rely on novel recurrent stages to update the pose and shape parameters such that not only losses are minimized effectively, but the process is meta-regularized in order to ensure end-progress. HUND's symmetry between training and testing makes it the first 3d human sensing architecture to natively support different operating regimes including self-supervised ones. In diverse tests, we show that HUND achieves very competitive results in datasets like H3.6M and 3DPW, aswell as good quality 3d reconstructions for complex imagery collected in-the-wild.
翻译:我们展示了重建3D构成和人形状的深心神经网络方法,并给出了输入的 RGB 图像。 我们依靠的是最近引入的、表层完整的人体统计3D人类模型、GHUM、经过训练的端对端模型、学习在自我监管的制度中重建其构成和形状状态。 我们的方法的核心是学习学习学习和优化方法,称为HumanNealfrom(HUND),它避免在培训模型参数时进行二阶分法区分,也避免昂贵的状态梯度下降,以便准确地在测试时将可调低的语义差异造成损失。 相反,我们依靠新颖的经常性阶段来更新形状和形状参数,以便不仅能够有效减少损失,而且这一过程是非常规化的,以确保最终进展。 HUND在培训和测试之间的对称使其成为第一个3D人类感测结构,以本地支持不同的操作系统,包括自监督的操作系统。 在不同的测试中,我们显示, HUNND在HD在H3.6M 和 DPW 等数据集中取得了非常有竞争力的结果。