Estimating the 6D pose of objects is beneficial for robotics tasks such as transportation, autonomous navigation, manipulation as well as in scenarios beyond robotics like virtual and augmented reality. With respect to single image pose estimation, pose tracking takes into account the temporal information across multiple frames to overcome possible detection inconsistencies and to improve the pose estimation efficiency. In this work, we introduce a novel Deep Neural Network (DNN) called VIPose, that combines inertial and camera data to address the object pose tracking problem in real-time. The key contribution is the design of a novel DNN architecture which fuses visual and inertial features to predict the objects' relative 6D pose between consecutive image frames. The overall 6D pose is then estimated by consecutively combining relative poses. Our approach shows remarkable pose estimation results for heavily occluded objects that are well known to be very challenging to handle by existing state-of-the-art solutions. The effectiveness of the proposed approach is validated on a new dataset called VIYCB with RGB image, IMU data, and accurate 6D pose annotations created by employing an automated labeling technique. The approach presents accuracy performances comparable to state-of-the-art techniques, but with the additional benefit of being real-time.
翻译:估计天体的 6D 形状对机器人的任务有利,例如运输、自主导航、操纵以及虚拟和扩展现实等机器人以外的情景,例如虚拟和扩展现实。关于单一图像构成估计,显示跟踪考虑到跨多个框架的时间信息,以克服可能的检测不一致,提高估计效率。在这项工作中,我们引入了一个新的深神经网络(DNN),称为VINS),将惯性数据和相机数据结合起来,以解决天体实时跟踪问题。关键贡献是设计一个新的DNN(DNN)结构,将视觉和惯性功能结合,以预测天体在连续图像框架之间的相对6D 形状。6D 整体6D 形状随后通过连续组合相对面来估计。我们的方法显示,对非常隐蔽的物体的显著估计结果是显著的,众所周知,这些物体很难被现有最新解决方案所处理。拟议方法的效力在称为VIYCB的新数据集上得到验证,该数据集有RGB图像、IMU数据,以及精确的6D(6D) 说明是使用自动标签技术创造出来的,但又具有额外效益。该方法的准确性性性性性性表现了与状态。