Object pose estimation is a critical task in robotics for precise object manipulation. However, current techniques heavily rely on a reference 3D object, limiting their generalizability and making it expensive to expand to new object categories. Direct pose predictions also provide limited information for robotic grasping without referencing the 3D model. Keypoint-based methods offer intrinsic descriptiveness without relying on an exact 3D model, but they may lack consistency and accuracy. To address these challenges, this paper proposes ShapeShift, a superquadric-based framework for object pose estimation that predicts the object's pose relative to a primitive shape which is fitted to the object. The proposed framework offers intrinsic descriptiveness and the ability to generalize to arbitrary geometric shapes beyond the training set.
翻译:目标姿态估计是机器人精确物体操作的重要任务。然而,当前的技术严重依赖于参考三维对象,限制了它们的泛化能力,并使得扩展到新的物体类别的代价昂贵。直接姿态预测对于机器人抓取提供有限的信息,而不参考3D模型。基于关键点的方法提供了内在描述性而不需要精确的3D模型,但它们可能缺乏一致性和精度。为了解决这些挑战,本文提出了ShapeShift,一种基于超椭球的物体姿态估计框架,它预测相对于拟合到物体上的基元形状的物体姿态。所提出的框架提供了内在的描述性和超越训练数据集的任意几何形状的泛化能力。