In this paper we propose a new framework to categorize social interactions in egocentric videos, we named InteractionGCN. Our method extracts patterns of relational and non-relational cues at the frame level and uses them to build a relational graph from which the interactional context at the frame level is estimated via a Graph Convolutional Network based approach. Then it propagates this context over time, together with first-person motion information, through a Gated Recurrent Unit architecture. Ablation studies and experimental evaluation on two publicly available datasets validate the proposed approach and establish state of the art results.
翻译:在本文中,我们提出一个新的框架,将社会互动分类在以自我为中心的视频中,我们命名为“互动GCN”。我们的方法在框架一级提取了关系和非关系提示模式,并用这些模式来构建一个关系图,通过基于图集的网络方法对框架一级的互动背景进行估计。然后,它通过一个Ged 经常单元结构,与第一人运动信息一起,长期传播这一背景。对两个公开的数据集进行对比研究和实验评估,验证了拟议的方法,确定了最新结果。