Applications based on image retrieval require editing and associating in intermediate spaces that are representative of the high-level concepts like objects and their relationships rather than dense, pixel-level representations like RGB images or semantic-label maps. We focus on one such representation, scene graphs, and propose a novel scene expansion task where we enrich an input seed graph by adding new nodes (objects) and the corresponding relationships. To this end, we formulate scene graph expansion as a sequential prediction task involving multiple steps of first predicting a new node and then predicting the set of relationships between the newly predicted node and previous nodes in the graph. We propose a sequencing strategy for observed graphs that retains the clustering patterns amongst nodes. In addition, we leverage external knowledge to train our graph generation model, enabling greater generalization of node predictions. Due to the inefficiency of existing maximum mean discrepancy (MMD) based metrics for graph generation problems in evaluating predicted relationships between nodes (objects), we design novel metrics that comprehensively evaluate different aspects of predicted relations. We conduct extensive experiments on Visual Genome and VRD datasets to evaluate the expanded scene graphs using the standard MMD-based metrics and our proposed metrics. We observe that the graphs generated by our method, GEMS, better represent the real distribution of the scene graphs than the baseline methods like GraphRNN.
翻译:基于图像检索的应用要求编辑和联系中间空间,这些中间空间代表了高层次概念,如物体及其关系,而不是密集的、像素级的演示,如 RGB 图像或语义标签地图。我们注重一个演示、场景图,并提出新的场景扩展任务,通过添加新的节点(对象)和相应的关系来丰富输入种子图。为此,我们将场景图扩展作为一个连续的预测任务,涉及多个步骤,先预测一个新的节点,然后预测新预测的节点与图表中以前的节点之间的关系。我们为观察的图表提出一个排序战略,保留节点之间的群集模式。此外,我们利用外部知识来培训我们的图形生成模型模型模型,使节点预测更加普遍化。由于现有以图表生成问题为基础的最大平均值差异(MMD)在评估节点(对象)的预测关系方面效率不高,我们设计新的指标,全面评估预测的节点与前节点和前节点之间的关系。我们在视觉基因组和VRD数据集中进行广泛的实验,我们利用图表的模型来评估我们的图表的扩大的分布图型号模型,我们以更好的模型的模型的图表,我们用来评估我们所制作的图表的模型的模型的模型。