This paper introduces a dataset for training and evaluating methods for 6D pose estimation of hand-held tools in task demonstrations captured by a standard RGB camera. Despite the significant progress of 6D pose estimation methods, their performance is usually limited for heavily occluded objects, which is a common case in imitation learning where the object is typically partially occluded by the manipulating hand. Currently, there is a lack of datasets that would enable the development of robust 6D pose estimation methods for these conditions. To overcome this problem, we collect a new dataset (Imitrob) aimed at 6D pose estimation in imitation learning and other applications where a human holds a tool and performs a task. The dataset contains image sequences of three different tools and six manipulation tasks with two camera viewpoints, four human subjects, and left/right hand. Each image is accompanied by an accurate ground truth measurement of the 6D object pose, obtained by the HTC Vive motion tracking device. The use of the dataset is demonstrated by training and evaluating a recent 6D object pose estimation method (DOPE) in various setups. The dataset and code are publicly available at http://imitrob.ciirc.cvut.cz/imitrobdataset.php.
翻译:本文介绍了用于培训和评价6D构成一种标准 RGB 相机所摄任务演示的手持工具的数据集。 尽管 6D 构成估计方法取得了显著的进展,但其性能通常限于严重隐蔽的物体,这是模仿学习中常见的一个常见案例,即该物体通常被操纵手部分隐蔽。目前,缺少数据集,以便能够为这些条件制定稳健的6D 构成估计方法。为了克服这一问题,我们收集了一套新的数据集(Imitrob),旨在6D 构成模拟学习和其他应用程序中的估计,其中一个人持有一个工具并执行一项任务。该数据集包含三个不同工具的图像序列和六个操作任务,有两种相机视角,四个人类主体和左/右手。每张图像都配有由HTC Vive 动作跟踪装置获得的6D 构成的准确地面真象测量。通过培训和评估最近的6D 对象在各种设置中显示该数据集的使用情况。数据设置和代码可在 http://imovirp. httpset/trobridata.