Scene graph generation aims to interpret an input image by explicitly modelling the potential objects and their relationships, which is predominantly solved by the message passing neural network models in previous methods. Currently, such approximation models generally assume the output variables are totally independent and thus ignore the informative structural higher-order interactions. This could lead to the inconsistent interpretations for an input image. In this paper, we propose a novel neural belief propagation method to generate the resulting scene graph. It employs a structural Bethe approximation rather than the mean field approximation to infer the associated marginals. To find a better bias-variance trade-off, the proposed model not only incorporates pairwise interactions but also higher order interactions into the associated scoring function. It achieves the state-of-the-art performance on various popular scene graph generation benchmarks.
翻译:场景图形生成的目的是通过明确模拟潜在天体及其关系来解释输入图像,这主要通过以往方法中传递神经网络模型的信息来解决。 目前,这种近似模型通常假设输出变量是完全独立的,从而忽略了信息化结构上的更高阶级互动。 这可能导致输入图像解释不一致。 在本文中,我们提出一种新的神经信仰传播方法来生成由此生成的场景图。 它使用结构上的“近似”而不是平均场面近似值来推断相关的边缘。 为了找到更好的偏差偏差取舍, 拟议的模型不仅将双向互动纳入相关的评分功能, 而且还将更高的顺序互动纳入相关的评分功能中。 它在各种流行场景图生成基准中实现了最先进的表现。