Positivity is one of the three conditions for causal inference from observational data. The standard way to validate positivity is to analyze the distribution of propensity. However, to democratize the ability to do causal inference by non-experts, it is required to design an algorithm to (i) test positivity and (ii) explain where in the covariate space positivity is lacking. The latter could be used to either suggest the limitation of further causal analysis and/or encourage experimentation where positivity is violated. The contribution of this paper is first present the problem of automatic positivity analysis and secondly to propose an algorithm based on a two steps process. The first step, models the propensity condition on the covariates and then analyze the latter distribution using multiple hypothesis testing to create positivity violation labels. The second step uses asymmetrically pruned decision trees for explainability. The latter is further converted into readable text a non-expert can understand. We demonstrate our method on a proprietary data-set of a large software enterprise.
翻译:概率是从观测数据得出的因果关系推断的三个条件之一。 验证阳性的标准方法是分析倾向的分布。 然而,为了实现非专家因果推断能力的民主化,需要设计一种算法,以便(一) 测试正率,(二) 解释共变空间的偏差,后者可用来提出进一步因果关系分析的限制和(或)鼓励试验,如果偏差是相反的。本文的贡献首先是提出自动正率分析的问题,其次是提出基于两个步骤的算法。第一步,用多种假设测试模拟共变体的倾向性条件,然后用多种假设测试分析后一种分布,以创建假相违反标签。第二步使用不对称的倾斜决定树来解释。后者进一步转换成非专家能够理解的可读文本。我们在大型软件企业的专有数据集上展示了我们的方法。