基因组比例生物网络恢复的因果发现和最佳实验设计 (Causal Discovery and Optimal Experimental Design for Genome-Scale Biological Network Recovery)

Causal discovery of genome-scale networks is important for identifying pathways from genes to observable traits - e.g. differences in cell function, disease, drug resistance and others. Causal learners based on graphical models rely on interventional samples to orient edges in the network. However, these models have not been shown to scale up the size of the genome, which are on the order of 1e3-1e4 genes. We introduce a new learner, SP-GIES, that jointly learns from interventional and observational datasets and achieves almost 4x speedup against an existing learner for 1,000 node networks. SP-GIES achieves an AUC-PR score of 0.91 on 1,000 node networks, and scales up to 2,000 node networks - this is 4x larger than existing works. We also show how SP-GIES improves downstream optimal experimental design strategies for selecting interventional experiments to perform on the system. This is an important step forward in realizing causal discovery at scale via autonomous experimental design.

翻译：因果发现基因组比例网络对于识别基因到可观察性状之间的途径是很重要的，例如细胞功能的差异，疾病，药物抗性等。基于图形模型的因果学习器依赖于干预样本来定向网络中的边。但是，这些模型尚未显示出扩大基因组尺寸的规模，基因组尺寸大约为1e3-1e4个基因。我们引入了一个新的学习器SP-GIES，该学习器同时从干预和观察数据集中学习，并对1000节点网络的现有学习器实现超过4倍的速度提升。 SP-GIES在1000节点网络上实现了0.91的AUC-PR分数，并且可以扩展到2000节点网络-这是现有作品的4倍大。我们还展示了SP-GIES如何改进下游最佳实验设计策略，以选择在系统上执行的干预实验。这是实现通过自主实验设计实现规模化的因果发现的重要步骤。