The journey of reducing noise from distant supervision (DS) generated training data has been started since the DS was first introduced into the relation extraction (RE) task. For the past decade, researchers apply the multi-instance learning (MIL) framework to find the most reliable feature from a bag of sentences. Although the pattern of MIL bags can greatly reduce DS noise, it fails to represent many other useful sentence features in the datasets. In many cases, these sentence features can only be acquired by extra sentence-level human annotation with heavy costs. Therefore, the performance of distantly supervised RE models is bounded. In this paper, we go beyond typical MIL framework and propose a novel contrastive instance learning (CIL) framework. Specifically, we regard the initial MIL as the relational triple encoder and constraint positive pairs against negative pairs for each instance. Experiments demonstrate the effectiveness of our proposed framework, with significant improvements over the previous methods on NYT10, GDS and KBP.
翻译:从远程监督(DS)产生的培训数据中减少噪音的旅程始于最初将DS引入关系提取(RE)任务的DS以来;过去十年,研究人员采用多年级学习(MIL)框架来从一袋句子中找到最可靠的特征;虽然MIL袋的格局可以大大降低DS噪音,但它不能在数据集中代表许多其他有用的句子特征;在许多情况下,这些句子特征只能通过额外判决水平的人文注解获得,费用高昂;因此,远程监督的RE模型的性能相互交错。在本文中,我们超越了典型的MIL框架,提出了一个新的对比性实例学习框架。具体地说,我们认为最初的MIL是关系三重编码器,并限制正面对对负对每对负对。实验显示了我们提议的框架的有效性,大大改进了以前对NYT10、GDS和KBP的处理方法。