Recent scene graph generation (SGG) frameworks have focused on learning complex relationships among multiple objects in an image. Thanks to the nature of the message passing neural network (MPNN) that models high-order interactions between objects and their neighboring objects, they are dominant representation learning modules for SGG. However, existing MPNN-based frameworks assume the scene graph as a homogeneous graph, which restricts the context-awareness of visual relations between objects. That is, they overlook the fact that the relations tend to be highly dependent on the objects with which the relations are associated. In this paper, we propose an unbiased heterogeneous scene graph generation (HetSGG) framework that captures relation-aware context using message passing neural networks. We devise a novel message passing layer, called relation-aware message passing neural network (RMP), that aggregates the contextual information of an image considering the predicate type between objects. Our extensive evaluations demonstrate that HetSGG outperforms state-of-the-art methods, especially outperforming on tail predicate classes.
翻译:最近的图像生成框架( SGG) 侧重于学习图像中多个天体之间的复杂关系。 由于信息传递神经网络(MPNN)的性质, 模拟天体及其相邻天体之间的高顺序互动, 它们是 SGG 的主要代表学习模块。 但是, 现有的 MPNN 框架将场景图视为一个单一的图形, 限制了天体之间视觉关系的背景意识。 也就是说, 它们忽略了这样的事实, 这种关系往往高度依赖于关系关联对象。 在本文中, 我们提出一个不带偏见的多元图像生成( HetSGG) 框架, 利用信息传递神经网络( MPN) 来捕捉关系认知环境。 我们设计了一个新颖的信息传递层, 称为关系认知信息传递神经网络( RMP), 将图像的背景信息汇总为考虑到天体之间的上游类型。 我们的广泛评估表明, HetSGGG 超越了最先进的方法, 特别是在尾端上游阶级上表现优异。