This paper considers the problem of estimating the unknown intervention targets in a causal directed acyclic graph from observational and interventional data. The focus is on soft interventions in linear structural equation models (SEMs). Current approaches to causal structure learning either work with known intervention targets or use hypothesis testing to discover the unknown intervention targets even for linear SEMs. This severely limits their scalability and sample complexity. This paper proposes a scalable and efficient algorithm that consistently identifies all intervention targets. The pivotal idea is to estimate the intervention sites from the difference between the precision matrices associated with the observational and interventional datasets. It involves repeatedly estimating such sites in different subsets of variables. The proposed algorithm can be used to also update a given observational Markov equivalence class into the interventional Markov equivalence class. Consistency, Markov equivalency, and sample complexity are established analytically. Finally, simulation results on both real and synthetic data demonstrate the gains of the proposed approach for scalable causal structure recovery. Implementation of the algorithm and the code to reproduce the simulation results are available at \url{https://github.com/bvarici/intervention-estimation}.
翻译:本文从观测和干预数据中考虑从因果关系方向的循环图中估计未知干预目标的问题。重点是线性结构方程模型(SEMs)中的软性干预。目前的因果结构方法是学习已知干预目标,还是使用假设测试来发现即使是线性SEM的未知干预目标。这严重限制了其可缩放性和抽样复杂性。本文件提出了一个可缩放和高效的算法,以一致确定所有干预目标。关键的想法是从与观测和干预数据集相关的精确矩阵之间的差别来估计干预点。它涉及反复估计不同变量组别中的此类站点。提议的算法也可以用来更新干预性马尔科夫等同等级的某一观测马尔科夫等等同等级。一致性、马尔科夫等同性和抽样复杂性是经过分析确定的。最后,关于真实和合成数据的模拟结果显示了拟议可缩放因果关系结构恢复方法的收益。在\url{https://github.com/bvarici/sessimation-stimation}提供复制模拟结果的算法和代码。