Interaction group detection has been previously addressed with bottom-up approaches which relied on the position and orientation information of individuals. These approaches were primarily based on pairwise affinity matrices and were limited to static, third-person views. This problem can greatly benefit from a holistic approach based on Graph Neural Networks (GNNs) beyond pairwise relationships, due to the inherent spatial configuration that exists between individuals who form interaction groups. Our proposed method, GROup detection With Link prediction (GROWL), demonstrates the effectiveness of a GNN based approach. GROWL predicts the link between two individuals by generating a feature embedding based on their neighbourhood in the graph and determines whether they are connected with a shallow binary classification method such as Multi-layer Perceptrons (MLPs). We test our method against other state-of-the-art group detection approaches on both a third-person view dataset and a robocentric (i.e., egocentric) dataset. In addition, we propose a multimodal approach based on RGB and depth data to calculate a representation GROWL can utilise as input. Our results show that a GNN based approach can significantly improve accuracy across different camera views, i.e., third-person and egocentric views.
翻译:之前曾采用依靠个人定位和定向信息的自下而上的方法来处理互动群体探测问题,这些方法主要基于双向亲近矩阵,限于静态的第三人观点。基于图形神经网络(GNN)超越双向关系的整体方法,这个问题大有裨益,因为形成互动群体的个人之间存在固有的空间配置。我们提议的方法,GROup探测与链接预测(GROWL),显示了基于GNN方法的有效性。GROWL预测通过在图形中基于近邻生成一个特征嵌入来预测两个个人之间的联系,并确定了它们是否与多层 Perceptrons(MLPs)等浅层二元分类方法相关。我们测试了我们的方法,以其他最先进的群体探测方法,即第三人查看数据集和robocent(即自我中心)数据集。此外,我们提议了基于RGB和深度数据的一种多式联运方法,以计算GROWL可用作输入的代表数据。我们的结果表明,基于GNNP的自我中心观点可以大大改进不同摄像头的自我观点。