Scene graph generation (SGG) is built on top of detected objects to predict object pairwise visual relations for describing the image content abstraction. Existing works have revealed that if the links between objects are given as prior knowledge, the performance of SGG is significantly improved. Inspired by this observation, in this article, we propose a relation regularized network (R2-Net), which can predict whether there is a relationship between two objects and encode this relation into object feature refinement and better SGG. Specifically, we first construct an affinity matrix among detected objects to represent the probability of a relationship between two objects. Graph convolution networks (GCNs) over this relation affinity matrix are then used as object encoders, producing relation-regularized representations of objects. With these relation-regularized features, our R2-Net can effectively refine object labels and generate scene graphs. Extensive experiments are conducted on the visual genome dataset for three SGG tasks (i.e., predicate classification, scene graph classification, and scene graph detection), demonstrating the effectiveness of our proposed method. Ablation studies also verify the key roles of our proposed components in performance improvement.
翻译:光谱图生成(SGG)建在被检测到的物体之上,以预测对象对视图像关系,以描述图像内容的抽象性;现有工程显示,如果将天体之间的联系作为先前的知识,则SGG的性能得到显著改善;根据这一观察,在本条中,我们建议建立一个有规律化的网络(R2-Net),可以预测两个对象之间的关系,并将这种关系纳入对象特征的完善和更好的SGG。具体地说,我们首先在被检测到的物体之间建立一个近似矩阵,以表明两个对象之间关系的概率。关于这种关系的图集网络(GCNs)随后被用作对象编码器,产生关联性物体的正规化图示。有了这些有规律化的特征,我们的R2-Net可以有效地改进对象标签和生成场景图。我们为三个SGGG任务(即上游分类、景图分类和场景图探测)对视觉基因组数据集进行了广泛的实验,以表明我们提议的方法的有效性。