Since the introduction of modern deep learning methods for object pose estimation, test accuracy and efficiency has increased significantly. For training, however, large amounts of annotated training data are required for good performance. While the use of synthetic training data prevents the need for manual annotation, there is currently a large performance gap between methods trained on real and synthetic data. This paper introduces a new method, which bridges this gap. Most methods trained on synthetic data use 2D images, as domain randomization in 2D is more developed. To obtain precise poses, many of these methods perform a final refinement using 3D data. Our method integrates the 3D data into the network to increase the accuracy of the pose estimation. To allow for domain randomization in 3D, a sensor-based data augmentation has been developed. Additionally, we introduce the SparseEdge feature, which uses a wider search space during point cloud propagation to avoid relying on specific features without increasing run-time. Experiments on three large pose estimation benchmarks show that the presented method outperforms previous methods trained on synthetic data and achieves comparable results to existing methods trained on real data.
翻译:自采用现代深入的物体估计方法以来,测试准确性和效率有了显著提高。但是,对于培训来说,需要大量的附加说明的培训数据才能取得良好的业绩。使用合成培训数据可以避免人工说明的必要性,但目前实际数据和合成数据培训方法之间存在很大的性能差距。本文介绍了一种弥合这一差距的新方法。大多数关于合成数据的培训方法使用2D图象,因为2D中的域随机化比较发达。为了获得精确的成份,许多这些方法使用3D数据进行最后的改进。我们的方法将3D数据纳入网络,以提高组合估算的准确性。为了允许3D中的域随机化,还开发了基于传感器的数据增强功能。此外,我们引入了在点云传播期间使用更广泛的搜索空间以避免在不增加运行时间的情况下依赖特定特征的SprassEdge特征。对三种大型的估算基准的实验表明,所提出的方法比以前训练的合成数据方法要好,并取得与实际数据培训的现有方法相近的结果。