Estimating the 6D pose of objects is beneficial for robotics tasks such as transportation, autonomous navigation, manipulation as well as in scenarios beyond robotics like virtual and augmented reality. With respect to single image pose estimation, pose tracking takes into account the temporal information across multiple frames to overcome possible detection inconsistencies and to improve the pose estimation efficiency. In this work, we introduce a novel Deep Neural Network (DNN) called VIPose, that combines inertial and camera data to address the object pose tracking problem in real-time. The key contribution is the design of a novel DNN architecture which fuses visual and inertial features to predict the objects' relative 6D pose between consecutive image frames. The overall 6D pose is then estimated by consecutively combining relative poses. Our approach shows remarkable pose estimation results for heavily occluded objects that are well known to be very challenging to handle by existing state-of-the-art solutions. The effectiveness of the proposed approach is validated on a new dataset called VIYCB with RGB image, IMU data, and accurate 6D pose annotations created by employing an automated labeling technique. The approach presents accuracy performances comparable to state-of-the-art techniques, but with additional benefit to be real-time.
翻译:估计天体的 6D 形状对机器人的任务有利,例如运输、自主导航、操纵以及虚拟和增强现实等机器人以外的情景,例如虚拟和增强现实。关于单一图像构成估计,显示跟踪考虑到跨多个框架的时间信息,以克服可能的检测不一致,提高估计效率。在这项工作中,我们引入了一个新的深神经网络(DNN),称为VINS),将惯性数据和相机数据结合起来,以解决天体实时跟踪问题。关键贡献是设计一个新的 DNN 结构,将视觉和惯性功能结合在一起,以预测天体在连续图像框架之间的相对6D 形状。总体 6D 表示的6D 形状随后通过连续组合相对面来估计。我们的方法显示,对于已知极隐蔽的物体,其估计结果非常难以被现有最新解决方案所处理。拟议方法的有效性在一个新的数据集(即带有 RGB 图像的VIYCB、IMU 数据、准确的6D 6D ) 上作了说明,而采用自动标签技术则产生额外效果。该方法显示精确性性性性性性性性性能与州-受益。