Wearable cameras allow to acquire images and videos from the user's perspective. These data can be processed to understand humans behavior. Despite human behavior analysis has been thoroughly investigated in third person vision, it is still understudied in egocentric settings and in particular in industrial scenarios. To encourage research in this field, we present MECCANO, a multimodal dataset of egocentric videos to study humans behavior understanding in industrial-like settings. The multimodality is characterized by the presence of gaze signals, depth maps and RGB videos acquired simultaneously with a custom headset. The dataset has been explicitly labeled for fundamental tasks in the context of human behavior understanding from a first person view, such as recognizing and anticipating human-object interactions. With the MECCANO dataset, we explored five different tasks including 1) Action Recognition, 2) Active Objects Detection and Recognition, 3) Egocentric Human-Objects Interaction Detection, 4) Action Anticipation and 5) Next-Active Objects Detection. We propose a benchmark aimed to study human behavior in the considered industrial-like scenario which demonstrates that the investigated tasks and the considered scenario are challenging for state-of-the-art algorithms. To support research in this field, we publicy release the dataset at https://iplab.dmi.unict.it/MECCANO/.
翻译:为鼓励该领域的研究,我们提供了由自我中心视频组成的多式数据集MECCANO, 用于研究类似工业环境中人类行为理解情况。多式的特征是存在与定制头盔同时获取的视觉信号、深度地图和 RGB 视频。数据集被明确标为人类行为理解方面的基本任务,例如识别和预测人类目标相互作用。我们利用MECCANO数据集,探讨了五项不同的任务,包括:(1) 行动识别,(2) 主动物体探测和识别,(3) 以高心人为中心的人与目标互动探测,(4) 行动预测和(5) 下一个动态物体探测。我们提出了一个基准,旨在研究与自定义的类似工业情景有关的人类行为,其中显示所调查的任务和所考虑的情景对状态-act-art 算法/ Wechaset-algalgals。