The goal of relation extraction (RE) is to extract the semantic relations between/among entities in the text. As a fundamental task in natural language processing, it is crucial to ensure the robustness of RE models. Despite the high accuracy current deep neural models have achieved in RE tasks, they are easily affected by spurious correlations. One solution to this problem is to train the model with counterfactually augmented data (CAD) such that it can learn the causation rather than the confounding. However, no attempt has been made on generating counterfactuals for RE tasks. In this paper, we formulate the problem of automatically generating CAD for RE tasks from an entity-centric viewpoint, and develop a novel approach to derive contextual counterfactuals for entities. Specifically, we exploit two elementary topological properties, i.e., the centrality and the shortest path, in syntactic and semantic dependency graphs, to first identify and then intervene on the contextual causal features for entities. We conduct a comprehensive evaluation on four RE datasets by combining our proposed approach with a variety of backbone RE models. The results demonstrate that our approach not only improves the performance of the backbones, but also makes them more robust in the out-of-domain test.
翻译:关系提取(RE)的目标是在文本中找出实体之间/实体之间的语义关系。作为自然语言处理的一个基本任务,必须确保RE模型的稳健性。尽管目前RE任务中已经实现了高度精密的深层神经模型,但它们很容易受到虚假关联的影响。这个问题的一个解决办法是用反事实增强的数据(CAD)来训练模型,以便它能够了解因果关系,而不是混淆。然而,没有试图为RE任务产生反事实。在本文中,我们从实体中心的观点中提出自动产生RE任务的CAD的问题,并开发一种新颖的方法来为实体得出背景反事实。具体地说,我们利用两种基本的地形特性,即中心特征和最短路径,在合成和语义依赖性图表中,以便首先查明实体的背景因果关系特征,然后干预这些特征。我们通过将拟议的方法与各种RE模型结合起来,对四种RE数据集进行全面评价。结果显示,我们的方法不仅能够改进实体的骨干模型,而且能够更有力地测试它们。