Distantly supervised models are very popular for relation extraction since we can obtain a large amount of training data using the distant supervision method without human annotation. In distant supervision, a sentence is considered as a source of a tuple if the sentence contains both entities of the tuple. However, this condition is too permissive and does not guarantee the presence of relevant relation-specific information in the sentence. As such, distantly supervised training data contains much noise which adversely affects the performance of the models. In this paper, we propose a self-ensemble filtering mechanism to filter out the noisy samples during the training process. We evaluate our proposed framework on the New York Times dataset which is obtained via distant supervision. Our experiments with multiple state-of-the-art neural relation extraction models show that our proposed filtering mechanism improves the robustness of the models and increases their F1 scores.
翻译:远程监管模式非常受欢迎,因为我们可以使用远程监督方法获得大量培训数据,而无需人工批注。在远程监管中,如果该句包含图普尔的两个实体,则将句子视为一个图例的来源。然而,这一条件过于宽松,不能保证该句中存在相关的特定关系信息。因此,远端监管的培训数据含有许多噪音,对模型的性能产生不利影响。在本文中,我们提议了一种可自我结合的过滤机制,在培训过程中过滤吵闹的样本。我们评估了通过远程监管获得的关于纽约时报数据集的拟议框架。我们对多种状态神经关系提取模型的实验表明,我们提议的过滤机制提高了模型的稳健性并增加了其F1分数。