We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence, while simultaneously performing neural 3D reconstruction of the object. Our method works for arbitrary rigid objects, even when visual texture is largely absent. The object is assumed to be segmented in the first frame only. No additional information is required, and no assumption is made about the interaction agent. Key to our method is a Neural Object Field that is learned concurrently with a pose graph optimization process in order to robustly accumulate information into a consistent 3D representation capturing both geometry and appearance. A dynamic pool of posed memory frames is automatically maintained to facilitate communication between these threads. Our approach handles challenging sequences with large pose changes, partial and full occlusion, untextured surfaces, and specular highlights. We show results on HO3D, YCBInEOAT, and BEHAVE datasets, demonstrating that our method significantly outperforms existing approaches. Project page: https://bundlesdf.github.io
翻译:我们提出一种近乎实时的方法,能够从单眼RGBD视频序列跟踪未知物体的6自由度,同时进行神经网络三维重建。我们的方法适用于任意刚体对象,即使视觉纹理很少也能工作。物体只在第一个帧中进行分割。不需要额外的信息,也不对交互代理做任何假设。我们方法的关键在于学习神经对象场,同时进行姿态图优化进程,以便将信息健壮地累积到一致的三维表示中,捕捉几何和外观两方面的内容。自动维护姿态内存帧的动态池,以便促进这些线程之间的通信。我们的方法处理具有大的姿态变化、部分和完全遮挡、无纹理表面和镜面高光等具有挑战性的序列。我们展示了HO3D、YCBInEOAT和BEHAVE数据集上的结果,证明我们的方法明显优于现有方法。项目主页:https://bundlesdf.github.io