A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the remaining challenges. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects, encompassing settings where text is used as an outcome, treatment, or as a means to address confounding. In addition, we explore potential uses of causal inference to improve the performance, robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the computational linguistics community.
翻译:科学研究的基本目标是了解因果关系。然而,尽管因果关系在生命和社会科学中发挥着关键作用,但在自然语言处理(自然语言处理)中却没有同等的重要性,因为自然语言处理历来更加强调预测性任务。这种区别正在逐渐消失,在因果推断和语言处理的趋同方面,正在出现一个跨学科研究领域。但是,关于自然语言处理因果关系的研究仍然分散在各个领域,没有统一的定义、基准数据集和对其余挑战的明确阐述。在本次调查中,我们整合了跨学术领域的研究,并将其置于更广泛的自然语言处理中。我们引入了估算因果关系的统计挑战,包括将文字用作结果、处理或作为解决混淆问题的手段的各种环境。此外,我们探索因果推断的潜在用途,以提高国家语言处理模式的性能、稳健性、公正性和可解释性。因此,我们统一了计算语言界的因果关系推断。