A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the challenges and opportunities in the application of causal inference to the textual domain, with its unique properties. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects with text, encompassing settings where text is used as an outcome, treatment, or to address confounding. In addition, we explore potential uses of causal inference to improve the robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the NLP community.
翻译:科学研究的一个根本目标是了解因果关系。然而,尽管因果关系在生命和社会科学中起着关键作用,但在自然语言处理(自然语言处理)中却没有同等的重要性,而自然语言处理历来更加强调预测性任务。这种区别正在逐渐消失,在因果推断和语言处理的趋同方面,正在形成一个新的跨学科研究领域。但是,关于自然语言处理因果关系的研究仍然分散在各个领域,没有统一的定义、基准数据集和明确阐述在应用因果推断文字领域及其独特特性方面的挑战和机遇。在这次调查中,我们整合了跨学术领域的研究,并将其置于更广泛的自然语言处理领域。我们提出了用文字来估计因果影响的统计挑战,其中包括将文字用作结果、处理或解决混杂问题的各种环境。此外,我们探索因果推断的潜在用途,以提高自然污染模型的稳健性、公正性和可解释性。我们因此对自然科学领域的因果关系推断提供了统一的概览。