The superior performance of supervised relation extraction (RE) methods heavily relies on a large amount of gold standard data. Recent zero-shot relation extraction methods converted the RE task to other NLP tasks and used off-the-shelf models of these NLP tasks to directly perform inference on the test data without using a large amount of RE annotation data. A potentially valuable by-product of these methods is the large-scale silver standard data. However, there is no further investigation on the use of potentially valuable silver standard data. In this paper, we propose to first detect a small amount of clean data from silver standard data and then use the selected clean data to finetune the pretrained model. We then use the finetuned model to infer relation types. We also propose a class-aware clean data detection module to consider class information when selecting clean data. The experimental results show that our method can outperform the baseline by 12% and 11% on TACRED and Wiki80 dataset in the zero-shot RE task. By using extra silver standard data of different distributions, the performance can be further improved.
翻译:监督关系提取(RE)方法的优异性能在很大程度上依赖大量金标准数据。最近的零点关系提取方法将RE任务转换成其他NLP任务,并使用这些NLP任务的现成模型直接对测试数据进行推断,而不使用大量RE注解数据。这些方法的潜在宝贵副产品是大型银标准数据。然而,对于潜在有价值的银标准数据的使用没有进行进一步调查。在本文件中,我们提议首先从银标准数据中检测少量清洁数据,然后使用选定的清洁数据对预先培训的模式进行微调。我们随后使用微调模型来推断关联类型。我们还提议使用一个班级清洁数据检测模块,以便在选择清洁数据时考虑类信息。实验结果显示,在零发RED和Wiki80任务中,我们的方法可以超过基线12%和11%。通过使用额外的银标准数据,可以进一步改进性能。