Causal modeling provides us with powerful counterfactual reasoning and interventional mechanism to generate predictions and reason under various what-if scenarios. However, causal discovery using observation data remains a nontrivial task due to unobserved confounding factors, finite sampling, and changes in the data distribution. These can lead to spurious cause-effect relationships. To mitigate these challenges in practice, researchers augment causal learning with known causal relations. The goal of the paper is to study the impact of expert knowledge on causal relations in the form of additional constraints used in the formulation of the nonparametric NOTEARS. We provide a comprehensive set of comparative analyses of biasing the model using different types of knowledge. We found that (i) knowledge that corrects the mistakes of the NOTEARS model can lead to statistically significant improvements, (ii) constraints on active edges have a larger positive impact on causal discovery than inactive edges, and surprisingly, (iii) the induced knowledge does not correct on average more incorrect active and/or inactive edges than expected. We also demonstrate the behavior of the model and the effectiveness of domain knowledge on a real-world dataset.
翻译:然而,利用观测数据得出的因果发现由于未观察到的混淆因素、有限的抽样和数据分布的变化,仍是一项非边际任务。这可能导致虚假的因果关系关系。为了减轻这些挑战,研究人员在实践上增加了因果学习,并发现了因果关系;本文件的目的是研究专家知识对因果关系的影响,在制订非参数性ONSARS时使用了额外的限制。我们提供了一套综合的比较分析,对使用不同类型知识的模型的偏向进行了比较分析。我们发现,(一) 纠正ONSARS模型错误的知识可导致统计上的重大改进,(二) 积极边缘的限制对因果发现的积极影响大于不活跃边缘,令人惊讶的是,(三) 引出的知识对平均而言比预期的更不正确的积极和(或)不活跃边缘并不正确。我们还展示模型的行为和真实世界数据集域知识的有效性。