Causal Markov 边界 (Causal Markov Boundaries)

Feature selection is an important problem in machine learning, which aims to select variables that lead to an optimal predictive model. In this paper, we focus on feature selection for post-intervention outcome prediction from pre-intervention variables. We are motivated by healthcare settings, where the goal is often to select the treatment that will maximize a specific patient's outcome; however, we often do not have sufficient randomized control trial data to identify well the conditional treatment effect. We show how we can use observational data to improve feature selection and effect estimation in two cases: (a) using observational data when we know the causal graph, and (b) when we do not know the causal graph but have observational and limited experimental data. Our paper extends the notion of Markov boundary to treatment-outcome pairs. We provide theoretical guarantees for the methods we introduce. In simulated data, we show that combining observational and experimental data improves feature selection and effect estimation.

翻译：在机器学习中,选择特征是一个重要问题,目的是选择能够导致最佳预测模型的变量。在本文中,我们侧重于从干预前变量中选择干预后结果预测的特征。我们受保健环境的驱动,我们的目标往往是选择能够使特定病人的结果最大化的治疗;然而,我们往往没有足够的随机控制试验数据来很好地确定有条件治疗的效果。我们展示了我们如何利用观测数据来改进特征选择和影响估计,在两种情况下:(a)当我们了解因果图时使用观测数据,以及(b)当我们不知道因果图但有观察性和有限的实验数据时。我们的文件将Markov边界的概念扩大到治疗结果组合。我们为我们引入的方法提供了理论保证。在模拟数据中,我们显示观测数据与实验数据相结合可以改进特征选择和影响估计。

相关内容

特征选择

关注 5931

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

INRIA最新「机器学习理论」新书，229页pdf原理性阐述机器学习

专知会员服务

69+阅读 · 2021年3月27日

哈佛大学Hernan教授《因果推断:What If》新书，311页讲解因果效应（附下载）

专知会员服务

166+阅读 · 2021年1月7日

【干货书】机器学习速查手册，135页pdf

专知会员服务

127+阅读 · 2020年11月20日

【干货书】真实机器学习，264页pdf，Real-World Machine Learning

专知会员服务

115+阅读 · 2020年4月5日