Text-to-Graph extraction aims to automatically extract information graphs consisting of mentions and types from natural language texts. Existing approaches, such as table filling and pairwise scoring, have shown impressive performance on various information extraction tasks, but they are difficult to scale to datasets with longer input texts because of their second-order space/time complexities with respect to the input length. In this work, we propose a Hybrid Span Generator (HySPA) that invertibly maps the information graph to an alternating sequence of nodes and edge types, and directly generates such sequences via a hybrid span decoder which can decode both the spans and the types recurrently in linear time and space complexities. Extensive experiments on the ACE05 dataset show that our approach also significantly outperforms state-of-the-art on the joint entity and relation extraction task.
翻译:现有方法,如填表和配对评分等,在各种信息提取任务上表现出令人印象深刻的业绩,但由于输入长度方面的空间/时间复杂性居于第二位,因此很难以较长的输入文本对数据集进行缩放。在这项工作中,我们提议建立一个混合的 Span 生成器(HySPA),该生成器可逆地将信息图绘制成交替的节点和边缘类型序列,并通过混合的解码器直接生成此类序列,该解码器可以在线性时间和空间复杂度中解码频频频和类型。关于ACE05数据集的广泛实验显示,我们的方法也大大超出联合实体和关联提取任务方面的先进状态。