The vast majority of existing methods and systems for causal inference assume that all variables under consideration are categorical or numerical (e.g., gender, price, blood pressure, enrollment). In this paper, we present CausalNLP, a toolkit for inferring causality from observational data that includes text in addition to traditional numerical and categorical variables. CausalNLP employs the use of meta-learners for treatment effect estimation and supports using raw text and its linguistic properties as both a treatment and a "controlled-for" variable (e.g., confounder). The library is open-source and available at: https://github.com/amaiya/causalnlp.
翻译:绝大多数现有的因果推断方法和制度都假定审议中的所有变量都是绝对的或数字的(例如性别、价格、血压、招生),在本文中,我们介绍CausalNLP,这是一个从观察数据中推断因果关系的工具,除了传统的数字和绝对变量外,还包括文字。CausalNLP使用元清除器进行治疗效果估计,支持使用原始文本及其语言特性作为治疗和“控制”变量(例如, confounder)。图书馆是开放的,可在以下网址查阅:https://github.com/amaiya/causalnp。