In recent years, there is a surge of generation-based information extraction work, which allows a more direct use of pre-trained language models and efficiently captures output dependencies. However, previous generative methods using lexical representation do not naturally fit document-level relation extraction (DocRE) where there are multiple entities and relational facts. In this paper, we investigate the root cause of the underwhelming performance of the existing generative DocRE models and discover that the culprit is the inadequacy of the training paradigm, instead of the capacities of the models. We propose to generate a symbolic and ordered sequence from the relation matrix which is deterministic and easier for model to learn. Moreover, we design a parallel row generation method to process overlong target sequences. Besides, we introduce several negative sampling strategies to improve the performance with balanced signals. Experimental results on four datasets show that our proposed method can improve the performance of the generative DocRE models. We have released our code at https://github.com/ayyyq/DORE.
翻译:近年来,以一代为基础的信息提取工作激增,使得能够更直接地使用经过训练的语文模型,并有效地捕捉产出依赖性。然而,以往使用词汇表达法的基因化方法并不自然适合具有多个实体和关联事实的文件级关系提取法(DocRE ) 。在本文件中,我们调查了现有基因化Docre 模型表现不佳的根本原因,发现罪魁祸首是培训模式的不足,而不是模型的能力。我们提议从关系矩阵中产生一个象征性的、有序的序列,这种序列具有确定性,便于模型学习。此外,我们还设计了平行的行生成方法,处理超长的目标序列。此外,我们引入了几种负面抽样战略,以平衡信号改进业绩。四个数据集的实验结果表明,我们提出的方法可以改善基因化DocRE模型的性能。我们在https://github.com/ayyq/DORE发布了我们的代码。