Document-level relation extraction (RE) aims at extracting relations among entities expressed across multiple sentences, which can be viewed as a multi-label classification problem. In a typical document, most entity pairs do not express any pre-defined relation and are labeled as "none" or "no relation". For good document-level RE performance, it is crucial to distinguish such \textit{none} class instances (entity pairs) from those of pre-defined classes (relations). However, most existing methods only estimate the probability of pre-defined relations independently without considering the probability of "no relation". This ignores the context of entity pairs and the label correlations between the none class and pre-defined classes, leading to sub-optimal predictions. To address this problem, we propose a new multi-label loss that encourages large \textit{margins} of label confidence scores between each pre-defined class and the none class, which enables captured label correlations and context-dependent thresholding for label prediction. To gain further robustness against positive-negative imbalance and mislabeled data that could appear in real-world RE datasets, we propose a margin regularization and a margin shifting technique. Experimental results demonstrate that our method significantly outperforms existing multi-label losses for document-level RE and works well in other multi-label tasks such as emotion classification when none class instances are available for training.
翻译:文件级别关系提取(RE)的目的是在多个句子中表达的各实体之间的关系,这可以被看作是一个多标签分类问题。在典型的文档中,大多数实体对口并不表示任何预定义的关系,而是标为“无”或“无关系 ” 。对于良好的文档级别 RE 性能,关键是要将这类文件级别级别(实体对等)与预定义类别(关系)区分开来。然而,大多数现有方法仅独立估计预定义关系的概率,而不考虑“无关系”的可能性。这忽略了实体对口的背景以及无类别和预定义类别之间的标签相关性,导致亚最佳预测。为了解决这一问题,我们提议新的多标签损失,鼓励在预定义类别和无类别类别(实体对等)之间的标签信任分数(实体对等)与预定义类别(关系)。然而,大多数现有方法只能独立估计预定义关系的可能性,而不考虑“无关系”的可能性。为了进一步稳健地应对正反偏向的不平衡和错误标签数据的相关性,从而导致次优的分类等级的标签水平,我们提议在现实世界的模型模型中将现有数据流值和变现的模型中,以显示其他的模型,从而显示我们现有的模型的模型的模型的模型的模型的模型的模型的模型的模型。