Document-level relation extraction (DocRE) aims to identify semantic labels among entities within a single document. One major challenge of DocRE is to dig decisive details regarding a specific entity pair from long text. However, in many cases, only a fraction of text carries required information, even in the manually labeled supporting evidence. To better capture and exploit instructive information, we propose a novel expLicit syntAx Refinement and Subsentence mOdeliNg based framework (LARSON). By introducing extra syntactic information, LARSON can model subsentences of arbitrary granularity and efficiently screen instructive ones. Moreover, we incorporate refined syntax into text representations which further improves the performance of LARSON. Experimental results on three benchmark datasets (DocRED, CDR, and GDA) demonstrate that LARSON significantly outperforms existing methods.
翻译:文件级关系提取(DocRE)的目的是在一份单一文件中确定各实体之间的语义标签,DocRE的一个主要挑战是从长长的文本中找出具体实体对应方的决定性细节,然而,在许多情况下,只有一小部分文本包含必要的信息,即使是人工标签的辅助证据中也是如此。为了更好地收集和利用具有启发性的信息,我们提议了一个新的基于ExpLititit SynitAx精炼和从属参考框架(LARSON),通过引入额外的合成信息,LARSON可以建模任意颗粒性和高效屏幕指导性的附属参数。此外,我们把精细的语法纳入文本表达中,以进一步改进LARSON的性能。三个基准数据集(DocRED、CDR和GDA)的实验结果表明,LARSON大大超越了现有方法。