Recent work achieved impressive progress towards joint reconstruction of hands and manipulated objects from monocular color images. Existing methods focus on two alternative representations in terms of either parametric meshes or signed distance fields (SDFs). On one side, parametric models can benefit from prior knowledge at the cost of limited shape deformations and mesh resolutions. Mesh models, hence, may fail to precisely reconstruct details such as contact surfaces of hands and objects. SDF-based methods, on the other side, can represent arbitrary details but are lacking explicit priors. In this work we aim to improve SDF models using priors provided by parametric representations. In particular, we propose a joint learning framework that disentangles the pose and the shape. We obtain hand and object poses from parametric models and use them to align SDFs in 3D space. We show that such aligned SDFs better focus on reconstructing shape details and improve reconstruction accuracy both for hands and objects. We evaluate our method and demonstrate significant improvements over the state of the art on the challenging ObMan and DexYCB benchmarks.
翻译:最近的工作在从单色图像中联合重建手和被操纵的物体方面取得了令人印象深刻的进展。现有方法侧重于在参数间距或签名的距离字段(SDFs)方面的两种替代表达方式。一方面,参数模型可以以有限的形状变形和网状分辨率为代价,从先前的知识中受益。因此,网状模型可能无法精确地重建细节,例如手和物体的接触面。基于SDF的方法可以代表任意的细节,但缺乏明确的前科。在这项工作中,我们的目标是利用参数表象提供的前科改进SDF模型。特别是,我们提议了一个分离形状和形状的联合学习框架。我们从参数模型中获取手和物体的构成,并用它们来将3D空间的SDFs相匹配。我们表明,这种匹配的SDFs更好地侧重于重建形状细节,提高手和物体的重建精度。我们评估了我们的方法,并展示了在挑战的ObMan和DexYCB基准方面的艺术状况方面的重大改进。