Document-level relation extraction (RE) aims at extracting relations among entities expressed across multiple sentences, which can be viewed as a multi-label classification problem. In a typical document, most entity pairs do not express any pre-defined relation and are labeled as "none" or "no relation". For good document-level RE performance, it is crucial to distinguish such none class instances (entity pairs) from those of pre-defined classes (relations). However, most existing methods only estimate the probability of pre-defined relations independently without considering the probability of "no relation". This ignores the context of entity pairs and the label correlations between the none class and pre-defined classes, leading to sub-optimal predictions. To address this problem, we propose a new multi-label loss that encourages large margins of label confidence scores between each pre-defined class and the none class, which enables captured label correlations and context-dependent thresholding for label prediction. To gain further robustness against positive-negative imbalance and mislabeled data that could appear in real-world RE datasets, we propose a margin regularization and a margin shifting technique. Experimental results demonstrate that our method significantly outperforms existing multi-label losses for document-level RE and works well in other multi-label tasks such as emotion classification when none class instances are available for training.
翻译:文件级关系提取(RE)的目的是在多个句子中表达的各实体之间的关系,这可以被视为一个多标签分类问题。在典型的文件中,大多数实体对口并不表示任何预先确定的关系,而是标为“无”或“无关系”。对于良好的文件级RE性能来说,关键是要将此类无等级实例(实体配对)与预先界定的类别(关系)区分开来。然而,大多数现有方法仅独立估计预先确定关系的概率,而不考虑“无关系”的可能性。这忽略了实体对口的背景以及无等级和预定义的类别之间的标签相关性,导致次最佳的预测。为了解决这一问题,我们提议新的多标签损失,鼓励在每个预先界定的类别和无等级之间在标签上留下很大的信任分数(实体配对),从而能够捕捉到标签的关联和根据背景进行预测的阈值阈值的阈值。为了在真实的 RE 数据集中出现正负不平衡和误标的数据,我们提议在现有的级别上进行比值调整,当我们现有等级等级的模型测试时,在多级别上显示我们现有等级的模型分析结果。