ML4C:通过间接邻近地区看到原因 (ML4C: Seeing Causality Through Latent Vicinity)

Supervised Causal Learning (SCL) aims to learn causal relations from observational data by accessing previously seen datasets associated with ground truth causal relations. This paper presents a first attempt at addressing a fundamental question: What are the benefits from supervision and how does it benefit? Starting from seeing that SCL is not better than random guessing if the learning target is non-identifiable a priori, we propose a two-phase paradigm for SCL by explicitly considering structure identifiability. Following this paradigm, we tackle the problem of SCL on discrete data and propose ML4C. The core of ML4C is a binary classifier with a novel learning target: it classifies whether an Unshielded Triple (UT) is a v-structure or not. Specifically, starting from an input dataset with the corresponding skeleton provided, ML4C orients each UT once it is classified as a v-structure. These v-structures are together used to construct the final output. To address the fundamental question of SCL, we propose a principled method for ML4C featurization: we exploit the vicinity of a given UT (i.e., the neighbors of UT in skeleton), and derive features by considering the conditional dependencies and structural entanglement within the vicinity. We further prove that ML4C is asymptotically correct. Last but foremost, thorough experiments conducted on benchmark datasets demonstrate that ML4C remarkably outperforms other state-of-the-art algorithms in terms of accuracy, reliability, robustness and tolerance. In summary, ML4C shows promising results on validating the effectiveness of supervision for causal learning. Our codes are publicly available at https://github.com/microsoft/ML4C.

翻译：监督原因学习( SCL ) 旨在通过访问与地面真相因果关系相关的先前所见数据集, 从观察数据中学习因果关系。本文首次尝试解决一个根本性问题: 监管的好处是什么, 以及它如何受益? 从看到SCL 不比随机猜测更胜于随机猜测, 如果学习目标无法先验, 我们建议SCL 有两个阶段的范式, 明确考虑结构可识别性。遵循这个范式, 我们处理离异数据上的 SSCL 问题, 并提议 ML4C 。 ML4C 的核心是一个二进制分类, 具有新颖的学习目标: 它区分了“ 未经过滤的三进制” (UT) 是否是一个 v- 结构或不是。具体地, 我们从一个输入数据数据集开始, ML4C 每一个被归类为 v- 结构。这些 v- 结构被一起用于构建最终输出。为了解决 SL 基本问题, 我们建议一个对 ML4C 的精度分析方法, 它的精准性分类, 是一个新的学习目标目标: 它是否是“ ”, 我们利用了“ 基础” 直观的直径解读” 和“, 我们的直观”, 的直观”, 的直径解读“, 我们利用了“, 的直观” 直观”,,, 的直径直径直径解读” 。