Counterfactual inference is a useful tool for comparing outcomes of interventions on complex systems. It requires us to represent the system in form of a structural causal model, complete with a causal diagram, probabilistic assumptions on exogenous variables, and functional assignments. Specifying such models can be extremely difficult in practice. The process requires substantial domain expertise, and does not scale easily to large systems, multiple systems, or novel system modifications. At the same time, many application domains, such as molecular biology, are rich in structured causal knowledge that is qualitative in nature. This manuscript proposes a general approach for querying a causal biological knowledge graph, and converting the qualitative result into a quantitative structural causal model that can learn from data to answer the question. We demonstrate the feasibility, accuracy and versatility of this approach using two case studies in systems biology. The first demonstrates the appropriateness of the underlying assumptions and the accuracy of the results. The second demonstrates the versatility of the approach by querying a knowledge base for the molecular determinants of a severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-induced cytokine storm, and performing counterfactual inference to estimate the causal effect of medical countermeasures for severely ill patients.
翻译:反事实推断是比较复杂系统干预结果的有用工具,它要求我们以结构因果模型的形式代表系统,以因果图、外源变量的概率假设和功能分配为完整,并用因果图、外源变量的概率假设和功能分配为完整。这些模型在实践中可能极为困难。这一过程需要大量的域域内专门知识,而且不易推广到大型系统、多个系统或新系统修改。与此同时,分子生物学等许多应用领域都具有质量性质的结构性因果知识。本稿提出一种一般方法,用于查询因果生物知识图表,并将质量结果转换成定量结构性因果模型,从数据中学习解答问题。我们用系统生物学的两个案例研究来证明这一方法的可行性、准确性和多变性。第一个过程显示了基本假设的恰当性和结果的准确性。第二个过程通过查询严重急性呼吸系统综合症2型病毒(SARS-COV-2)的急性细胞风暴的分子决定因素的知识库,并进行反事实性反措施,以估计病因后果。