We address the problem of integrating data from multiple observational and interventional studies to eventually compute counterfactuals in structural causal models. We derive a likelihood characterisation for the overall data that leads us to extend a previous EM-based algorithm from the case of a single study to that of multiple ones. The new algorithm learns to approximate the (unidentifiability) region of model parameters from such mixed data sources. On this basis, it delivers interval approximations to counterfactual results, which collapse to points in the identifiable case. The algorithm is very general, it works on semi-Markovian models with discrete variables and can compute any counterfactual. Moreover, it automatically determines if a problem is feasible (the parameter region being nonempty), which is a necessary step not to yield incorrect results. Systematic numerical experiments show the effectiveness and accuracy of the algorithm, while hinting at the benefits of integrating heterogeneous data to get informative bounds in case of unidentifiability.
翻译:我们解决了将多种观测和干预研究的数据综合起来,最终在结构性因果模型中计算反事实数据的问题。我们为总体数据得出了一个可能性特征,从而导致我们将先前的基于EM的算法从单一研究的个案扩大到多重研究的个案。新的算法学会了从这种混合数据来源接近模型参数的(不可识别性)区域。在此基础上,它为反事实结果提供间隔近似值,这些结果会崩溃到可识别的案例中的点。算法非常笼统,它用离散变量对半马尔科维安模型起作用,可以计算任何反事实。此外,它自动确定一个问题是否可行(参数区域是非空白的),这是不产生错误结果的必要步骤。系统的数字实验显示了算法的有效性和准确性,同时暗示了将混杂数据整合以获得不可识别的信息界限的好处。