We present a challenging and realistic novel dataset for evaluating 6-DOF object tracking algorithms. Existing datasets show serious limitations---notably, unrealistic synthetic data, or real data with large fiducial markers---preventing the community from obtaining an accurate picture of the state-of-the-art. Our key contribution is a novel pipeline for acquiring accurate ground truth poses of real objects w.r.t a Kinect V2 sensor by using a commercial motion capture system. A total of 100 calibrated sequences of real objects are acquired in three different scenarios to evaluate the performance of trackers in various scenarios: stability, robustness to occlusion and accuracy during challenging interactions between a person and the object. We conduct an extensive study of a deep 6-DOF tracking architecture and determine a set of optimal parameters. We enhance the architecture and the training methodology to train a 6-DOF tracker that can robustly generalize to objects never seen during training, and demonstrate favorable performance compared to previous approaches trained specifically on the objects to track.
翻译:我们为评价6-DOF物体跟踪算法提供了具有挑战性和现实的新数据集。现有的数据集显示严重的局限性——特别是不切实际的合成数据,或具有大型标记的真实数据,防止社区获得对最新艺术的准确图片。我们的主要贡献是利用商业运动捕捉系统获取真实物体(w.r.t一个Kinect V2传感器)的准确地面真象构成的新管道。在三种不同情景中共获得100个真实物体的校准序列,以评价跟踪者在各种情景中的性能:稳定、在个人与对象之间挑战性互动过程中对隐蔽性和准确性。我们广泛研究深6DOF跟踪结构并确定一套最佳参数。我们改进了架构和培训方法,以训练6-DF跟踪器,该跟踪器能够强有力地概括培训在训练期间从未见过的物体,并显示与以往专门训练的跟踪物体的方法相比,其优劣性表现。