As an essential component of human cognition, cause-effect relations appear frequently in text, and curating cause-effect relations from text helps in building causal networks for predictive tasks. Existing causality extraction techniques include knowledge-based, statistical machine learning(ML)-based, and deep learning-based approaches. Each method has its advantages and weaknesses. For example, knowledge-based methods are understandable but require extensive manual domain knowledge and have poor cross-domain applicability. Statistical machine learning methods are more automated because of natural language processing (NLP) toolkits. However, feature engineering is labor-intensive, and toolkits may lead to error propagation. In the past few years, deep learning techniques attract substantial attention from NLP researchers because of its' powerful representation learning ability and the rapid increase in computational resources. Their limitations include high computational costs and a lack of adequate annotated training data. In this paper, we conduct a comprehensive survey of causality extraction. We initially introduce primary forms existing in the causality extraction: explicit intra-sentential causality, implicit causality, and inter-sentential causality. Next, we list benchmark datasets and modeling assessment methods for causal relation extraction. Then, we present a structured overview of the three techniques with their representative systems. Lastly, we highlight existing open challenges with their potential directions.
翻译:作为人类认知的一个基本组成部分,文本中往往出现因果关系,而文本中确定因果关系有助于建立预测性任务的因果关系网络。现有的因果关系提取技术包括基于知识、统计机学习和深层次的学习方法。每种方法都有其优点和弱点。例如,基于知识的方法是可以理解的,但需要广泛的人工领域知识,而且跨部适用性差。统计机学习方法由于自然语言处理工具包而更加自动化。然而,特征工程是劳力密集型的,工具可能会导致错误传播。在过去几年中,深层次学习技术吸引了全国语言方案研究人员的大量关注,因为其具有强大的代表性学习能力和计算资源的迅速增加。其局限性包括高计算成本和缺乏适当的附加说明的培训数据。在本文件中,我们对因果关系提取进行全面调查。我们最初采用了因果提取中存在的主要形式:明确的内部因果关系、隐含的因果关系,以及各种工具可能会导致错误的传播。接下来,我们列出了国家语言委员会研究人员对深层次的学习技术的大量关注,因为其代表性学习能力和计算能力以及计算资源的迅速增加。在本文中,我们对因果关系进行了全面的调查。我们用现有的三个结构性评估方法来评估。