We propose a novel keypoint voting scheme based on intersecting spheres, that is more accurate than existing schemes and allows for a smaller set of more disperse keypoints. The scheme forms the basis of the proposed RCVPose method for 6 DoF pose estimation of 3D objects in RGB-D data, which is particularly effective at handling occlusions. A CNN is trained to estimate the distance between the 3D point corresponding to the depth mode of each RGB pixel, and a set of 3 disperse keypoints defined in the object frame. At inference, a sphere of radius equal to this estimated distance is generated, centered at each 3D point. The surface of these spheres votes to increment a 3D accumulator space, the peaks of which indicate keypoint locations. The proposed radial voting scheme is more accurate than previous vector or offset schemes, and robust to disperse keypoints. Experiments demonstrate RCVPose to be highly accurate and competitive, achieving state-of-the-art results on LINEMOD 99.7%, YCB-Video 97.2% datasets, and notably scoring +7.9% higher than previous methods on the challenging Occlusion LINEMOD 71.1% dataset.
翻译:我们提议了一个基于交叉交错范围的新型关键点投票计划,这个计划比现有计划更精确,并允许更小的一组更分散的键点。这个计划构成了6 DoF 中拟议的RCVPose 方法的基础,对 RGB-D 数据中的3D 对象进行了估算,这对于处理隔离处理特别有效。一个CNN受过培训,可以估计与每个RGB像素的深度模式相对应的3D点之间的距离,以及目标框架定义的一组3个分散关键点之间的距离。根据推断,产生了一个与这一估计距离相等的半径范围,以每个3D 点为中心。这些区域表面的投票将增加一个 3D 累积空间,其峰值表示关键点位置。拟议的紫外投票计划比先前的矢量或抵消计划更准确,而且能够稳健地驱散关键点。实验显示RCVPose 能够高度准确和具有竞争力,在LINEMOD 99.7%、YCB- Video D% D数据集方面达到最新结果,特别是挑战了先前的LDlus+7.9%的数据。