Learning causal relationships between variables is a fundamental task in causal inference and directed acyclic graphs (DAGs) are a popular choice to represent the causal relationships. As one can recover a causal graph only up to its Markov equivalence class from observations, interventions are often used for the recovery task. Interventions are costly in general and it is important to design algorithms that minimize the number of interventions performed. In this work, we study the problem of learning the causal relationships of a subset of edges (target edges) in a graph with as few interventions as possible. Under the assumptions of faithfulness, causal sufficiency, and ideal interventions, we study this problem in two settings: when the underlying ground truth causal graph is known (subset verification) and when it is unknown (subset search). For the subset verification problem, we provide an efficient algorithm to compute a minimum sized interventional set; we further extend these results to bounded size non-atomic interventions and node-dependent interventional costs. For the subset search problem, in the worst case, we show that no algorithm (even with adaptivity or randomization) can achieve an approximation ratio that is asymptotically better than the vertex cover of the target edges when compared with the subset verification number. This result is surprising as there exists a logarithmic approximation algorithm for the search problem when we wish to recover the whole causal graph. To obtain our results, we prove several interesting structural properties of interventional causal graphs that we believe have applications beyond the subset verification/search problems studied here.
翻译:变量之间的学习因果关系是因果推断和定向循环图(DAGs)的一个基本任务。根据忠实、因果充足和理想干预的假设,我们可以在两种情况下研究这一问题:当基本地面真相因果图表为人所知时(次设定核查),当它为未知时(次设定搜索),干预通常用于恢复任务。干预一般费用高昂,设计将干预次数减至最小的算法十分重要。在这项工作中,我们研究在尽可能少干预的图表中了解一组边缘(目标边缘)的因果关系的问题。在尽可能少采用因果干预的图表中,我们研究这一问题。在两种情况下,我们研究这一问题:当基本地面真相因果图表为人所知(次设定核查),当它为未知时(次设定搜索),我们提供一种高效的算法来计算一个最小规模的干预组合;我们进一步将这些结果扩大到受约束的非解剖面干预和不依赖干预费用。在最坏的情况下,我们研究的分类问题是,我们无法比一些直观/顺序应用更相信一个近的精确的精确比率,当我们研究时,我们无法将它作为直观的逻辑的精确的逻辑结果加以研究。