In this paper, we study the problem of parsing structured knowledge graphs from textual descriptions. In particular, we consider the scene graph representation that considers objects together with their attributes and relations: this representation has been proved useful across a variety of vision and language applications. We begin by introducing an alternative but equivalent edge-centric view of scene graphs that connect to dependency parses. Together with a careful redesign of label and action space, we combine the two-stage pipeline used in prior work (generic dependency parsing followed by simple post-processing) into one, enabling end-to-end training. The scene graphs generated by our learned neural dependency parser achieve an F-score similarity of 49.67% to ground truth graphs on our evaluation set, surpassing best previous approaches by 5%. We further demonstrate the effectiveness of our learned parser on image retrieval applications.
翻译:在本文中,我们研究了从文字描述中分解结构化知识图表的问题。特别是,我们考虑了将物体及其属性和关系一并考虑的情景图表示意图:这种示意图在各种视觉和语言应用中被证明是有用的。我们首先对与依赖剖析相连接的场景图采用了另一种但相当的边中心视图。我们仔细重新设计了标签和动作空间,同时将先前工作中使用的两阶段管道(先是基因依赖分析,然后是简单的后处理)合并为一个阶段,使最终到终端培训成为可能。我们所学的神经依赖分析师生成的场景图示在评估集上实现了49.67%的F-核心相似度,比先前的最佳方法高出5%。我们进一步展示了我们所学的图像检索应用分析器的有效性。