In many applications of advanced robotic manipulation, six degrees of freedom (6DoF) object pose estimates are continuously required. In this work, we develop a multi-modality tracker that fuses information from visual appearance and geometry to estimate object poses. The algorithm extends our previous method ICG, which uses geometry, to additionally consider surface appearance. In general, object surfaces contain local characteristics from text, graphics, and patterns, as well as global differences from distinct materials and colors. To incorporate this visual information, two modalities are developed. For local characteristics, keypoint features are used to minimize distances between points from keyframes and the current image. For global differences, a novel region approach is developed that considers multiple regions on the object surface. In addition, it allows the modeling of external geometries. Experiments on the YCB-Video and OPT datasets demonstrate that our approach ICG+ performs best on both datasets, outperforming both conventional and deep learning-based methods. At the same time, the algorithm is highly efficient and runs at more than 300 Hz. The source code of our tracker is publicly available.
翻译:在许多先进的机器人操作应用中,持续需要6度的自由(6DoF)天体的估算。在这项工作中,我们开发了一个多模式跟踪器,将视觉外观和几何信息整合起来,以估计天体构成。算法扩展了我们以前的导航卫星委员会方法,该方法使用了几何方法,增加了表面外观的考虑。一般而言,物体表面包含文字、图形和图案以及不同材料和颜色的全球差异。为了纳入这种视觉信息,开发了两种模式。对于当地特征,将关键点特征用于将关键框架和当前图像之间的距离降至最低。对于全球差异,将开发一种考虑到物体表面多个区域的新的区域方法。此外,它允许对外部地理特征进行建模。YCB-Video和ALF数据集的实验表明,我们的方法在数据集上都表现最佳,超过了常规和深层次的学习方法。同时,该算法效率很高,运行速度超过300赫兹。我们的跟踪器源代码是公开提供的。