Document-level relation extraction (DocRE) predicts relations for entity pairs that rely on long-range context-dependent reasoning in a document. As a typical multi-label classification problem, DocRE faces the challenge of effectively distinguishing a small set of positive relations from the majority of negative ones. This challenge becomes even more difficult to overcome when there exists a significant number of annotation errors in the dataset. In this work, we aim to achieve better integration of both the discriminability and robustness for the DocRE problem. Specifically, we first design an effective loss function to endow high discriminability to both probabilistic outputs and internal representations. We innovatively customize entropy minimization and supervised contrastive learning for the challenging multi-label and long-tailed learning problems. To ameliorate the impact of label errors, we equipped our method with a novel negative label sampling strategy to strengthen the model robustness. In addition, we introduce two new data regimes to mimic more realistic scenarios with annotation errors and evaluate our sampling strategy. Experimental results verify the effectiveness of each component and show that our method achieves new state-of-the-art results on the DocRED dataset, its recently cleaned version, Re-DocRED, and the proposed data regimes.
翻译:面向文档级关系抽取的辨别力和鲁棒性一体化
摘要:
文档级关系抽取(DocRE)是针对在文档中涉及长程依赖关系的实体对进行关系预测的任务。作为一种典型的多标签分类问题,DocRE需要有效地区分少量的正标签和占据大多数的负标签。当数据集中存在大量注释错误时,这一挑战变得更加艰巨。本工作旨在更好地融合辨别力和鲁棒性来解决DocRE问题。具体而言,我们首先设计了一种有效的损失函数,为概率输出和内部表示赋予高辨别力。我们创新地定制熵最小化和有监督对比学习来解决多标签和长尾学习难题。为了缓解标签错误的影响,我们采用一种新的负标签采样策略来增强模型的鲁棒性。此外,我们引入了两种新的数据范式来模拟更真实的带注释错误的场景,并评估了我们的采样策略。实验结果验证了每个组件的有效性,并表明我们的方法在DocRED数据集、其最近清洗过的版本Re-DocRED以及提出的数据范式上均取得了新的最优结果。