To understand a scene in depth not only involves locating/recognizing individual objects, but also requires to infer the relationships and interactions among them. However, since the distribution of real-world relationships is seriously unbalanced, existing methods perform quite poorly for the less frequent relationships. In this work, we find that the statistical correlations between object pairs and their relationships can effectively regularize semantic space and make prediction less ambiguous, and thus well address the unbalanced distribution issue. To achieve this, we incorporate these statistical correlations into deep neural networks to facilitate scene graph generation by developing a Knowledge-Embedded Routing Network. More specifically, we show that the statistical correlations between objects appearing in images and their relationships, can be explicitly represented by a structured knowledge graph, and a routing mechanism is learned to propagate messages through the graph to explore their interactions. Extensive experiments on the large-scale Visual Genome dataset demonstrate the superiority of the proposed method over current state-of-the-art competitors.
翻译:深入理解一幕不仅涉及定位/识别单个物体,而且需要推断它们之间的关系和相互作用。然而,由于真实世界关系的分布严重不平衡,现有方法对于较不常见的关系效果很差。在这项工作中,我们发现,对象对和它们之间的关系之间的统计相关性可以有效地规范语义空间,降低预测的模糊性,从而解决分布不平衡问题。为了实现这一目标,我们将这些统计相关性纳入深层神经网络,通过开发一个知识型流动网络,为生成场景图形提供便利。更具体地说,我们表明,图像中显示的物体及其关系之间的统计相关性可以通过结构化的知识图表明确体现,并学会通过图表传播信息以探索它们的互动关系。关于大型视觉基因组数据集的广泛实验表明,拟议方法优于当前最先进的竞争者。