This paper introduces a dataset for training and evaluating methods for 6D pose estimation of hand-held tools in task demonstrations captured by a standard RGB camera. Despite the significant progress of 6D pose estimation methods, their performance is usually limited for heavily occluded objects, which is a common case in imitation learning, where the object is typically partially occluded by the manipulating hand. Currently, there is a lack of datasets that would enable the development of robust 6D pose estimation methods for these conditions. To overcome this problem, we collect a new dataset (Imitrob) aimed at 6D pose estimation in imitation learning and other applications where a human holds a tool and performs a task. The dataset contains image sequences of nine different tools and twelve manipulation tasks with two camera viewpoints, four human subjects, and left/right hand. Each image is accompanied by an accurate ground truth measurement of the 6D object pose obtained by the HTC Vive motion tracking device. The use of the dataset is demonstrated by training and evaluating a recent 6D object pose estimation method (DOPE) in various setups.
翻译:本文介绍了一种数据集,用于训练和评估通过标准RGB摄像机捕获的任务演示中手持工具的6D姿态估计方法。尽管6D姿态估计方法取得了显着进展,但它们的性能通常受到严重遮挡的物体的限制,而在模仿学习中,这是一个常见情况,其中物体通常被操纵手部分遮挡。目前,缺少能够为这些情况开发稳健的6D姿态估计方法的数据集。为解决这个问题,我们收集了一个新的数据集(Imitrob),旨在针对模仿学习和其他人类持有工具并执行任务的应用中的6D姿态估计。该数据集包含九种不同工具和十二个操作任务的图像序列,具有两个相机视角,四个人体主体和左/右手。每张图像都伴随着一个准确的6D物体姿态的代表性测量结果,通过HTC Vive运动跟踪设备获得。使用数据集的演示通过在各种设置中训练和评估最近的6D物体姿态估计方法(DOPE)来展示。