Describing the causal relations governing a system is a fundamental task in many scientific fields, ideally addressed by experimental studies. However, obtaining data under intervention scenarios may not always be feasible, while discovering causal relations from purely observational data is notoriously challenging. In certain settings, such as genomics, we may have data from heterogeneous study conditions, with soft (partial) interventions only pertaining to a subset of the study variables, whose effects and targets are possibly unknown. Combining data from experimental and observational studies offers the opportunity to leverage both domains and improve on the identifiability of causal structures. To this end, we define the interventional BGe score for a mixture of observational and interventional data, where the targets and effects of intervention may be unknown. To demonstrate the approach we compare its performance to other state-of-the-art algorithms, both in simulations and data analysis applications. Prerogative of our method is that it takes a Bayesian perspective leading to a full characterisation of the posterior distribution of the DAG structures. Given a sample of DAGs one can also automatically derive full posterior distributions of the intervention effects. Consequently the method effectively captures the uncertainty both in the structure and the parameter estimates. Codes to reproduce the simulations and analyses are publicly available at github.com/jackkuipers/iBGe
翻译:描述一个系统的因果关系是许多科学领域的一项基本任务,最好通过实验研究加以解决。然而,在干预假设下获取数据可能并不总是可行的,而从纯观察数据中发现因果关系则具有臭名昭著的挑战性。在某些环境下,例如基因组学,我们可能拥有来自不同研究条件的数据,而软(部分)干预仅涉及研究变量的一个子项,其效果和目标可能不明。将实验和观察研究的数据结合起来,为利用两个领域和改进因果关系结构的可识别性提供了机会。为此,我们界定了观测和干预数据混合的干预BGe评分,其中可能不清楚干预的目标和效果。为了展示我们将其表现与其他最先进的算法进行比较的方法,在模拟和数据分析应用中都是如此。我们的方法的前提是,它从巴伊西亚角度出发,使DAG结构的后方分布具有充分特性。鉴于DAGs样本,我们还可以自动地从干预效果/干预影响估计的后方分布中得出完整的事后分布。因此,在现有的参数分析中,Grismaisa 和Sergib的复制准则。