Document-level relation extraction (RE) aims to identify relations between entities across multiple sentences. Most previous methods focused on document-level RE under full supervision. However, in real-world scenario, it is expensive and difficult to completely label all relations in a document because the number of entity pairs in document-level RE grows quadratically with the number of entities. To solve the common incomplete labeling problem, we propose a unified positive-unlabeled learning framework - shift and squared ranking loss positive-unlabeled (SSR-PU) learning. We use positive-unlabeled (PU) learning on document-level RE for the first time. Considering that labeled data of a dataset may lead to prior shift of unlabeled data, we introduce a PU learning under prior shift of training data. Also, using none-class score as an adaptive threshold, we propose squared ranking loss and prove its Bayesian consistency with multi-label ranking metrics. Extensive experiments demonstrate that our method achieves an improvement of about 14 F1 points relative to the previous baseline with incomplete labeling. In addition, it outperforms previous state-of-the-art results under both fully supervised and extremely unlabeled settings as well.
翻译:文档级关系提取(RE)旨在确定多个句子实体之间的关系。以往方法大多侧重于文档级 RE,但在现实世界中,由于文件级 RE 中实体对对数的数量与实体数量之差,因此将所有关系完全标在文件中是昂贵和困难的。为解决常见的不完全标签的标签问题,我们建议了一个统一的正值无标签学习框架 - 转换和平分排名损失正值(SSR-PU)学习。我们第一次在文档级 RE 上使用正值未标签(PU)学习。考虑到一个数据集的标签数据可能导致未标签数据先前的转换,我们在培训数据先前的转换中引入了PU学习。此外,我们用无分类的分数作为适应性阈值,我们提出平方位损失,并证明其与多标签分级指标的一致性。广泛的实验表明,我们的方法比前一个基线改进了大约14个F1点,但标签不完整。此外,它超越了以往在完全监督下的状态设置。