In many robotic applications, the environment setting in which the 6-DoF pose estimation of a known, rigid object and its subsequent grasping is to be performed, remains nearly unchanging and might even be known to the robot in advance. In this paper, we refer to this problem as instance-specific pose estimation: the robot is expected to estimate the pose with a high degree of accuracy in only a limited set of familiar scenarios. Minor changes in the scene, including variations in lighting conditions and background appearance, are acceptable but drastic alterations are not anticipated. To this end, we present a method to rapidly train and deploy a pipeline for estimating the continuous 6-DoF pose of an object from a single RGB image. The key idea is to leverage known camera poses and rigid body geometry to partially automate the generation of a large labeled dataset. The dataset, along with sufficient domain randomization, is then used to supervise the training of deep neural networks for predicting semantic keypoints. Experimentally, we demonstrate the convenience and effectiveness of our proposed method to accurately estimate object pose requiring only a very small amount of manual annotation for training.
翻译:在许多机器人应用中,6-DoF对已知的僵硬天体进行估计的环境环境环境环境及其随后的捕捉作用几乎保持不变,甚至可能事先为机器人所知。在本文中,我们将此问题称为针对具体实例的构成估计:机器人只对一套有限的熟悉情景作出高度精确的估计;现场的轻微变化,包括照明条件和背景外观的变化,是可以接受的,但预计不会发生剧烈的改变。为此,我们提出了一个方法,用于从一个 RGB 图像中快速培训和部署一条管道,以估计一个天体连续的6-DoF 形状。关键的想法是利用已知的相机配置和僵硬身体的几何方法,使一个大标记数据集的生成部分自动化。数据集,连同足够的域随机化,随后被用来监督用于预测语系关键点的深线网络的培训。实验,我们展示了我们为准确估计天体构成而提议的方法的方便性和有效性,只需要很少数量的人工说明培训。