Distant supervision (DS) is a strong way to expand the datasets for enhancing relation extraction (RE) models but often suffers from high label noise. Current works based on attention, reinforcement learning, or GAN are black-box models so they neither provide meaningful interpretation of sample selection in DS nor stability on different domains. On the contrary, this work proposes a novel model-agnostic instance sampling method for DS by influence function (IF), namely REIF. Our method identifies favorable/unfavorable instances in the bag based on IF, then does dynamic instance sampling. We design a fast influence sampling algorithm that reduces the computational complexity from $\mathcal{O}(mn)$ to $\mathcal{O}(1)$, with analyzing its robustness on the selected sampling function. Experiments show that by simply sampling the favorable instances during training, REIF is able to win over a series of baselines that have complicated architectures. We also demonstrate that REIF can support interpretable instance selection.
翻译:远程监督(DS)是扩大数据集以加强关系提取模型的有力方法,但往往会受到高标签噪音的影响。基于关注、强化学习或GAN的当前工程是黑盒模型,因此它们既不能对DS的样本选择提供有意义的解释,也不能在不同领域提供稳定性。相反,这项工作为DS通过影响功能(IF),即REIF,提出了一个全新的模型-不可知实例抽样方法。我们的方法确定了基于 IF 的包中有利/不利实例,然后进行动态实例抽样。我们设计了快速影响抽样算法,将计算复杂性从$\mathcal{O}(mn)减到$\mathcal{O}(1)美元,同时分析了选定取样功能的稳健性。实验表明,只要在培训期间对有利的实例进行抽样,REIF就能在一系列有复杂结构的基线中获胜。我们还表明,REIF可以支持可解释实例的选择。