6D object pose estimation has been a research topic in the field of computer vision and robotics. Many modern world applications like robot grasping, manipulation, autonomous navigation etc, require the correct pose of objects present in a scene to perform their specific task. It becomes even harder when the objects are placed in a cluttered scene and the level of occlusion is high. Prior works have tried to overcome this problem but could not achieve accuracy that can be considered reliable in real-world applications. In this paper, we present an architecture that, unlike prior work, is context-aware. It utilizes the context information available to us about the objects. Our proposed architecture treats the objects separately according to their types i.e; symmetric and non-symmetric. A deeper estimator and refiner network pair is used for non-symmetric objects as compared to symmetric due to their intrinsic differences. Our experiments show an enhancement in the accuracy of about 3.2% over the LineMOD dataset, which is considered a benchmark for pose estimation in the occluded and cluttered scenes, against the prior state-of-the-art DenseFusion. Our results also show that the inference time we got is sufficient for real-time usage.
翻译:6D 对象构成估计是计算机视觉和机器人领域的研究课题。 许多现代世界应用, 如机器人捕捉、操纵、自主导航等, 需要场景中物体的正确外观才能完成特定任务。 当物体被放置在一片混乱的场景中, 隔热程度较高时, 越发困难。 先前的工程试图克服这个问题, 但无法实现在现实世界应用中可以认为可靠的准确性。 在本文中, 我们提出了一个结构, 与先前的工作不同, 是符合背景的。 它使用我们掌握的关于天体的上下文信息。 我们提议的建筑根据对象的类型分别处理这些物体; 对称和非对称。 一个更深的估量器和精细的网络配对, 用于非对称对象, 与其内在差异相比。 我们的实验显示, 与LineMOD数据集相比, 3. 大约3. 2% 的准确性得到了提高, 因为它被认为是在被观测到的和封闭的场景中进行估计的基准, 相对于我们之前的实时结果也显示。