Compared with the progress made on human activity classification, much less success has been achieved on human interaction understanding (HIU). Apart from the latter task is much more challenging, the main cause is that recent approaches learn human interactive relations via shallow graphical models, which is inadequate to model complicated human interactions. In this paper, we propose a consistency-aware graph network, which combines the representative ability of graph network and the consistency-aware reasoning to facilitate the HIU task. Our network consists of three components, a backbone CNN to extract image features, a factor graph network to learn third-order interactive relations among participants, and a consistency-aware reasoning module to enforce labeling and grouping consistencies. Our key observation is that the consistency-aware-reasoning bias for HIU can be embedded into an energy function, minimizing which delivers consistent predictions. An efficient mean-field inference algorithm is proposed, such that all modules of our network could be trained jointly in an end-to-end manner. Experimental results show that our approach achieves leading performance on three benchmarks.
翻译:与人类活动分类方面取得的进展相比,在人类互动理解(HIU)方面所取得的成功要少得多。除了后一项任务更具挑战性之外,主要的原因是,最近的方法通过浅色图形模型学习人类互动关系,而浅色图形模型不足以模拟复杂的人类互动。在本文件中,我们建议建立一个一致性认知图形网络,将图形网络的代表性能力和一致性认知推理能力结合起来,以促进HIU的任务。我们的网络由三个部分组成:一个主干CNN来提取图像特征,一个用于学习参与者之间第三阶交互式关系的因子图形网络,一个具有一致性意识的推理模块,以强制执行标签和分组组合组合组合。我们的主要观察是,对HIU的一致认知偏向性偏向可以嵌入一个能提供一致预测的能源功能中,最大限度地减少这种偏向性的偏向,提出一个高效的中位推算法,这样我们网络的所有模块都可以以端到端的方式联合培训。实验结果表明,我们的方法可以在三个基准上取得领先的业绩。