This paper presents a new method to describe spatio-temporal relations between objects and hands, to recognize both interactions and activities within video demonstrations of manual tasks. The approach exploits Scene Graphs to extract key interaction features from image sequences, encoding at the same time motion patterns and context. Additionally, the method introduces an event-based automatic video segmentation and clustering, which allows to group similar events, detecting also on the fly if a monitored activity is executed correctly. The effectiveness of the approach was demonstrated in two multi-subject experiments, showing the ability to recognize and cluster hand-object and object-object interactions without prior knowledge of the activity, as well as matching the same activity performed by different subjects.
翻译:本文提出了一种新方法,用于描述物体和手之间的时空关系,以识别手动任务视频演示中的交互和活动。该方法利用场景图从图像序列中提取关键的交互特征,同时编码运动模式和上下文。此外,该方法引入了基于事件的自动视频分割和聚类,可以将相似的事件分组,还可以实时检测被监控活动是否被正确执行。该方法的有效性在两个多主体实验中得到了证实,演示了在不需要事先了解活动的情况下识别和聚类手-物体和物体-物体交互的能力,以及匹配不同主体执行的同一活动的能力。