6D object pose tracking has been extensively studied in the robotics and computer vision communities. The most promising solutions, leveraging on deep neural networks and/or filtering and optimization, exhibit notable performance on standard benchmarks. However, to our best knowledge, these have not been tested thoroughly against fast object motions. Tracking performance in this scenario degrades significantly, especially for methods that do not achieve real-time performance and introduce non negligible delays. In this work, we introduce ROFT, a Kalman filtering approach for 6D object pose and velocity tracking from a stream of RGB-D images. By leveraging real-time optical flow, ROFT synchronizes delayed outputs of low frame rate Convolutional Neural Networks for instance segmentation and 6D object pose estimation with the RGB-D input stream to achieve fast and precise 6D object pose and velocity tracking. We test our method on a newly introduced photorealistic dataset, Fast-YCB, which comprises fast moving objects from the YCB model set, and on the dataset for object and hand pose estimation HO-3D. Results demonstrate that our approach outperforms state-of-the-art methods for 6D object pose tracking, while also providing 6D object velocity tracking. A video showing the experiments is provided as supplementary material.
翻译:在机器人和计算机视觉界广泛研究了6D对象的形状跟踪。最有希望的解决办法是利用深神经网络和(或)过滤及优化,在标准基准上表现出显著的绩效。然而,据我们所知,这些尚未针对快速物体动作进行彻底测试。这一情景中的跟踪性能显著下降,特别是对于无法实现实时性能和造成不可忽略的延误的方法。在这项工作中,我们引入了ROFT,即Kalman过滤6D对象成形和从RGB-D图像流中快速跟踪的Kalman过滤方法。通过实时光学流,ROFT同步了低框架速动态神经网络的延迟产出,以进行分解和6D对象对RGB-D输入流进行估计,以实现快速和精确的6D对象成形和速度跟踪。我们用新推出的摄影现实数据集“快速YCBB”测试我们的方法,其中包括从YCB模型中快速移动的物体,以及天体和手图显示HO-3D。结果表明,我们的方法超越了低框架神经网络结构网络的延迟输出,同时提供A-D号实验性实验,同时提供6D实验材料跟踪。