Recent progress in object pose prediction provides a promising path for robots to build object-level scene representations during navigation. However, as we deploy a robot in novel environments, the out-of-distribution data can degrade the prediction performance. To mitigate the domain gap, we can potentially perform self-training in the target domain, using predictions on robot-captured images as pseudo labels to fine-tune the object pose estimator. Unfortunately, the pose predictions are typically outlier-corrupted, and it is hard to quantify their uncertainties, which can result in low-quality pseudo-labeled data. To address the problem, we propose a SLAM-supported self-training method, leveraging robot understanding of the 3D scene geometry to enhance the object pose inference performance. Combining the pose predictions with robot odometry, we formulate and solve pose graph optimization to refine the object pose estimates and make pseudo labels more consistent across frames. We incorporate the pose prediction covariances as variables into the optimization to automatically model their uncertainties. This automatic covariance tuning (ACT) process can fit 6D pose prediction noise at the component level, leading to higher-quality pseudo training data. We test our method with the deep object pose estimator (DOPE) on the YCB video dataset and in real robot experiments. It achieves respectively 34.3% and 17.8% accuracy enhancements in pose prediction on the two tests. Our code is available at https://github.com/520xyxyzq/slam-super-6d.
翻译:对象的预测为机器人在导航期间建立目标级场景演示提供了一条充满希望的道路。 但是,当我们在新环境下部署机器人时, 分配外数据可以降低预测性能。 为了缩小域间差距, 我们有可能在目标域进行自我训练, 使用机器人拍摄的图像的预测作为假标签来微调物体构成估计。 不幸的是, 图像预测通常会出错, 很难量化其不确定性, 这可能导致低质量的伪标签数据。 为了解决这个问题, 我们提议一个SLAM支持的自我培训方法, 利用机器人对 3D 场景地理测量的理解来提高对象的推断性能。 将机器人拍摄图像的预测与机器人观察性格合并, 我们制定和解决图示优化, 改进对象的预测性能估计, 使假标签更加一致。 我们把预测性变异性作为变量, 以自动模拟其不确定性。 这个自动变异性( ACT) 进程可以在 6D 显示精确度, 使3D的3 场景测得精确性, 将真实性测测算数据在深度水平上 。 我们的试算性测试系统系统将显示系统 17 。