We present ObjectMatch, a semantic and object-centric camera pose estimation for RGB-D SLAM pipelines. Modern camera pose estimators rely on direct correspondences of overlapping regions between frames; however, they cannot align camera frames with little or no overlap. In this work, we propose to leverage indirect correspondences obtained via semantic object identification. For instance, when an object is seen from the front in one frame and from the back in another frame, we can provide additional pose constraints through canonical object correspondences. We first propose a neural network to predict such correspondences on a per-pixel level, which we then combine in our energy formulation with state-of-the-art keypoint matching solved with a joint Gauss-Newton optimization. In a pairwise setting, our method improves registration recall of state-of-the-art feature matching from 77% to 87% overall and from 21% to 52% in pairs with 10% or less inter-frame overlap. In registering RGB-D sequences, our method outperforms cutting-edge SLAM baselines in challenging, low frame-rate scenarios, achieving more than 35% reduction in trajectory error in multiple scenes.
翻译:我们展示了ObjectMatch, 一种语义和以对象为中心的相机, 显示 RGB- D SLAM 管道的估计值。 现代相机显示的测算器依赖于各框架之间重叠区域的直接对应; 但是, 它们无法将相机框架与很少或没有重叠的图像框架相匹配。 在这项工作中, 我们提议利用通过语义对象识别获得的间接通信。 例如, 当一个对象从前方在一个框架中看到, 从后方在一个框架中看到, 从后方框架中看到, 从另一个框架中看到, 我们可以通过直观对象通信提供额外的制约。 我们首先提议建立一个神经网络, 以每像素水平预测这些通信, 然后在我们的能源配制中, 我们将其与最先进的关键点匹配结合起来, 与高斯- Newton 联合优化解决。 在对称环境中, 我们的方法提高了对从77%到87%到87%的状态特征的登记, 以及从21 %到52%对等方的状态特征, 与10%或更少的框架间重叠。 在登记 RGB- D 序列时, 我们的方法在具有挑战性、 低框架- 轨误差错误中超越了35 。