Precise 6D pose estimation of rigid objects from RGB images is a critical but challenging task in robotics and augmented reality. To address this problem, we propose DeepRM, a novel recurrent network architecture for 6D pose refinement. DeepRM leverages initial coarse pose estimates to render synthetic images of target objects. The rendered images are then matched with the observed images to predict a rigid transform for updating the previous pose estimate. This process is repeated to incrementally refine the estimate at each iteration. LSTM units are used to propagate information through each refinement step, significantly improving overall performance. In contrast to many 2-stage Perspective-n-Point based solutions, DeepRM is trained end-to-end, and uses a scalable backbone that can be tuned via a single parameter for accuracy and efficiency. During training, a multi-scale optical flow head is added to predict the optical flow between the observed and synthetic images. Optical flow prediction stabilizes the training process, and enforces the learning of features that are relevant to the task of pose estimation. Our results demonstrate that DeepRM achieves state-of-the-art performance on two widely accepted challenging datasets.
翻译:精确的 6D 表示对 RGB 图像中的刻板对象的精确估计是机器人和增强的现实中一项关键但具有挑战性的任务。 为了解决这个问题,我们建议DirrM, 这是6D 的新的重复网络架构。 DederRM 利用初始粗略的估计数来合成目标物体的合成图像。 然后,将提供的图像与观测到的图像相匹配,以预测更新先前的图像的刻板变异。 这一过程反复重复,以逐步地完善每次迭代的估计数。 LSTM 单位被用来通过每个改进步骤传播信息,大大改进总体性能。 与许多基于两阶段透视点的解决方案不同, DeepRM 是经过培训的端对端, 并使用可以通过一个单一参数来调整精度和效率的可缩放骨架。 在培训期间, 将多尺度的光流头添加来预测观测到的图像与合成图像之间的光流。 光流预测稳定了培训过程, 并强制学习与面估测任务相关的特征。 我们的结果表明, Distrain RM 在两个广泛接受的具有挑战性的数据集上取得了最先进的状态。