With the emergence of collaborative robots (cobots), human-robot collaboration in industrial manufacturing is coming into focus. For a cobot to act autonomously and as an assistant, it must understand human actions during assembly. To effectively train models for this task, a dataset containing suitable assembly actions in a realistic setting is crucial. For this purpose, we present the ATTACH dataset, which contains 51.6 hours of assembly with 95.2k annotated fine-grained actions monitored by three cameras, which represent potential viewpoints of a cobot. Since in an assembly context workers tend to perform different actions simultaneously with their two hands, we annotated the performed actions for each hand separately. Therefore, in the ATTACH dataset, more than 68% of annotations overlap with other annotations, which is many times more than in related datasets, typically featuring more simplistic assembly tasks. For better generalization with respect to the background of the working area, we did not only record color and depth images, but also used the Azure Kinect body tracking SDK for estimating 3D skeletons of the worker. To create a first baseline, we report the performance of state-of-the-art methods for action recognition as well as action detection on video and skeleton-sequence inputs. The dataset is available at https://www.tu-ilmenau.de/neurob/data-sets-code/attach-dataset .
翻译:随着协作机器人的出现,工业制造业中的人机协作备受关注。为了让协作机器人能够自主地充当助手,它必须能够理解装配过程中人类的动作。为有效训练这项任务的模型,需要一个包含适合实际场景的装配动作的数据集。为此,我们介绍了ATTACH数据集,包含51.6小时、95.2k个注释细粒度装配动作,由三个摄像机监测,这三个摄像机代表协作机器人的可能视角。由于在装配背景中,工人往往会同时用双手执行不同的动作,因此我们为每只手单独注释其执行的动作。因此,在ATTACH数据集中,超过68%的注释与其他注释重叠,这比其他相关数据集通常包含更简单的装配任务要多得多。为了更好地泛化到工作区域的背景,我们不仅记录了彩色和深度图像,还使用Azure Kinect体跟踪软件开发包来估计工人的3D骨架。为了创建第一个基准线,我们报告了目前最先进的方法在视频和骨架序列输入上进行的动作识别和检测的性能。该数据集可在https://www.tu-ilmenau.de/neurob/data-sets-code/attach-dataset上获取。