This paper presents an approach that reconstructs a hand-held object from a monocular video. In contrast to many recent methods that directly predict object geometry by a trained network, the proposed approach does not require any learned prior about the object and is able to recover more accurate and detailed object geometry. The key idea is that the hand motion naturally provides multiple views of the object and the motion can be reliably estimated by a hand pose tracker. Then, the object geometry can be recovered by solving a multi-view reconstruction problem. We devise an implicit neural representation-based method to solve the reconstruction problem and address the issues of imprecise hand pose estimation, relative hand-object motion, and insufficient geometry optimization for small objects. We also provide a newly collected dataset with 3D ground truth to validate the proposed approach.
翻译:本文介绍了一种从单视视频中重建手持物体的方法。 与通过经过训练的网络直接预测物体几何的许多最新方法不同, 拟议的方法并不要求事先了解物体的几何方法, 并且能够恢复更准确和详细的物体几何方法。 关键的想法是手动运动自然地提供了物体的多重视图, 运动可以由手动显示跟踪器可靠地估计。 然后, 物体几何方法可以通过解决多视角重建问题来恢复。 我们设计了一种隐含的神经代表法, 以解决重建问题, 并解决手势不精确、 相对的手动和小物体的几何优化不足的问题。 我们还提供了新收集的3D地面真象数据集, 以验证拟议的方法。