Causal inference is the process of estimating the effect or impact of a treatment on an outcome with other covariates as potential confounders (and mediators) that may need to be controlled. The vast majority of existing methods and systems for causal inference assume that all variables under consideration are categorical or numerical (e.g., gender, price, enrollment). In this paper, we present CausalNLP, a toolkit for inferring causality with observational data that includes text in addition to traditional numerical and categorical variables. CausalNLP employs the use of meta learners for treatment effect estimation and supports using raw text and its linguistic properties as a treatment, an outcome, or a "controlled-for" variable (e.g., confounder). The library is open source and available at: https://github.com/amaiya/causalnlp.
翻译:因果关系推断是估计与其他共同变体作为可能需要加以控制的潜在共变体(和调解人)的治疗结果的影响的过程; 绝大多数现有的因果推断方法和制度假定所有考虑的变量都是绝对的或数字的(例如性别、价格、招生); 在本文件中,我们介绍了CausalNLP,这是一个用观察数据来推断因果关系的工具,其中包括传统数字和绝对变量之外的文字; CausalNLP使用元学习者来进行治疗效果估计,并支持使用原始文本及其语言特性作为治疗、结果或“控制”变量(例如,confounder)。图书馆是开放的资料来源,可在以下网址查阅:https://github.com/amaiya/causalnp。