Although distant supervision automatically generates training data for relation extraction, it also introduces false-positive (FP) and false-negative (FN) training instances to the generated datasets. Whereas both types of errors degrade the final model performance, previous work on distant supervision denoising focuses more on suppressing FP noise and less on resolving the FN problem. We here propose H-FND, a hierarchical false-negative denoising framework for robust distant supervision relation extraction, as an FN denoising solution. H-FND uses a hierarchical policy which first determines whether non-relation (NA) instances should be kept, discarded, or revised during the training process. For those learning instances which are to be revised, the policy further reassigns them appropriate relations, making them better training inputs. Experiments on SemEval-2010 and TACRED were conducted with controlled FN ratios that randomly turn the relations of training and validation instances into negatives to generate FN instances. In this setting, H-FND can revise FN instances correctly and maintains high F1 scores even when 50% of the instances have been turned into negatives. Experiment on NYT10 is further conducted to shows that H-FND is applicable in a realistic setting.
翻译:虽然远程监督自动生成用于关系提取的培训数据,但它也为生成的数据集引入了假正(FP)和假负(FN)培训实例。两种类型的错误都会降低最后模型性能,而先前的远程监督分解工作则更侧重于抑制FP噪音,而不是解决FN问题。我们在此建议H-FND,一个用于强力远程监督提取的等级化的虚假负面排除框架,作为FN分解解决方案。H-FND使用一种等级政策,首先确定在培训过程中是否应当保留、放弃或修改非关系(NA)实例。对于将要修改的这些学习实例,政策进一步重新指定了它们适当的关系,使其得到更好的培训投入。在SemEval-2010和TACRED上进行实验时,采用了受控制的FN比率,将培训和验证案例的关系随机地变为负值,产生FN实例。在此情况下,H-FND可以正确修改FN实例,并保持很高的F1分数,即使其中50%的案例已经变成负值。在NY10实验中进一步显示,对NY10进行现实的实验是进一步的。