Tracking the 6D pose of objects in video sequences is important for robot manipulation. This work presents se(3)-TrackNet, a data-driven optimization approach for long term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained solely with synthetic data can work effectively over real images. Comprehensive experiments over multiple benchmarks show se(3)-TrackNet achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach runs in real time at 90.9Hz. Code, data and supplementary video for this project are available at https://github.com/wenbowen123/iros20-6d-pose-tracking
翻译:跟踪视频序列中天体的 6D 形状对于机器人操作很重要 。 这项工作展示了 se(3)- TrackNet, 这是一种数据驱动的长期优化方法, 6D 构成跟踪, 目的是根据当前 RGB- D 观测和合成图像, 以先前的最佳估计值和天体模型为条件, 确定最佳相对面貌 。 在这方面的关键贡献是一个新的神经网络结构, 它适当地分离了功能编码, 以帮助减少域变换, 并通过 lie Algebra 进行有效的 3D 定向 。 因此, 即使网络仅接受合成数据培训, 也能够有效地超越真实图像 。 多项基准的综合实验显示 se(3)- TrackNet 实现了持续稳健的估计数和优异的替代品, 尽管它们已经接受了真实图像培训 。 此方法实时运行在90.9Hz. 代码、 数据及该项目的补充视频可在 https://github.com/wenwen123/iros20-6d-stable- tracking 上查阅 。