Object pose estimation has multiple important applications, such as robotic grasping and augmented reality. We present a new method to estimate the 6D pose of objects that improves upon the accuracy of current proposals and can still be used in real-time. Our method uses RGB-D data as input to segment objects and estimate their pose. It uses a neural network with multiple heads, one head estimates the object classification and generates the mask, the second estimates the values of the translation vector and the last head estimates the values of the quaternion that represents the rotation of the object. These heads leverage a pyramid architecture used during feature extraction and feature fusion. Our method can be used in real-time with its low inference time of 0.12 seconds and has high accuracy. With this combination of fast inference and good accuracy it is possible to use our method in robotic pick and place tasks and/or augmented reality applications.
翻译:对象表面估计具有多个重要应用, 如机器人捕捉和增强现实。 我们提出了一个新的方法来估计 6D 形状的物体, 提高当前提案的准确性, 并且仍然可以实时使用。 我们的方法使用 RGB- D 数据作为分区对象的输入和估计其构成。 它使用一个有多个头的神经网络, 一个头估计天体分类并生成遮罩, 第二个头估计翻译矢量的值, 最后头估计代表物体旋转的四环值。 这些头利用特征提取和特征聚合中使用的金字塔结构。 我们的方法可以用0. 12 秒的低推算时间实时使用, 并且具有很高的准确性。 通过这种快速推断和精准的结合, 可以使用我们的方法来采集和设置任务, 和/ 增强现实应用 。