Distant supervision can effectively label data for relation extraction, but suffers from the noise labeling problem. Recent works mainly perform soft bag-level noise reduction strategies to find the relatively better samples in a sentence bag, which is suboptimal compared with making a hard decision of false positive samples in sentence level. In this paper, we introduce an adversarial learning framework, which we named DSGAN, to learn a sentence-level true-positive generator. Inspired by Generative Adversarial Networks, we regard the positive samples generated by the generator as the negative samples to train the discriminator. The optimal generator is obtained until the discrimination ability of the discriminator has the greatest decline. We adopt the generator to filter distant supervision training dataset and redistribute the false positive instances into the negative set, in which way to provide a cleaned dataset for relation classification. The experimental results show that the proposed strategy significantly improves the performance of distant supervision relation extraction comparing to state-of-the-art systems.
翻译:远程监督可以有效地为关系提取数据贴标签,但会受到噪音标签问题的影响。最近的工作主要是实施软袋级减少噪音战略,以在判决袋中找到相对更好的样本,这与在判决层对假正样作出硬性决定相比是不完美的。在本文中,我们引入了一种对抗性学习框架,我们称之为DSGAN,以学习一个判决级真实阳性生成器。在Genemental Aversarial 网络的启发下,我们把发电机产生的正面样本视为用于培训歧视者的负面样本。在歧视者能力下降之前,最佳生成器是获得的。我们采用该生成器来过滤遥远的监督培训数据集,并将假阳性实例重新分配到负数组,从而提供干净的数据集用于相关分类。实验结果显示,拟议的战略大大改进了远程监督提取与最新系统相比的性能。