In this work we explore reconstructing hand-object interactions in the wild. The core challenge of this problem is the lack of appropriate 3D labeled data. To overcome this issue, we propose an optimization-based procedure which does not require direct 3D supervision. The general strategy we adopt is to exploit all available related data (2D bounding boxes, 2D hand keypoints, 2D instance masks, 3D object models, 3D in-the-lab MoCap) to provide constraints for the 3D reconstruction. Rather than optimizing the hand and object individually, we optimize them jointly which allows us to impose additional constraints based on hand-object contact, collision, and occlusion. Our method produces compelling reconstructions on the challenging in-the-wild data from the EPIC Kitchens and the 100 Days of Hands datasets, across a range of object categories. Quantitatively, we demonstrate that our approach compares favorably to existing approaches in the lab settings where ground truth 3D annotations are available.
翻译:在这项工作中,我们探索重建野外的人工物体互动。 这个问题的核心挑战是缺乏适当的三维标签数据。 为了克服这个问题, 我们提议了一个基于优化的程序, 不需要直接的三维监督。 我们采取的总战略是利用所有可用的相关数据(2D 捆绑框、 2D 手键、 2D 掩体、 3D 对象模型、 3D lab MoCap) 来为3D 重建提供制约。 我们不是单独优化手和对象,而是共同优化它们, 使我们能够在手对手接触、 碰撞和隔离上施加额外的限制。 我们的方法产生了令人信服的重建, 重建来自 EIPIC Kitchens 和 100 天手数据集 的具有挑战性的数据, 跨越一系列对象类别。 定量而言, 我们证明我们的方法优于实验室环境中现有的方法, 那里有地面真相 3D 说明。