Latent variable models (LVMs) are probabilistic models where some of the variables are hidden during training. A broad class of LVMshave a directed acyclic graphical structure. The directed structure suggests an intuitive causal explanation of the data generating process. For example, a latent topic model suggests that topics cause the occurrence of a token. Despite this intuitive causal interpretation, a directed acyclic latent variable model trained on data is generally insufficient for causal reasoning, as the required model parameters may not be uniquely identified. In this manuscript we demonstrate that an LVM can answer any causal query posed post-training, provided that the query can be identified from the observed variables according to the do-calculus rules. We show that causal reasoning can enhance a broad class of LVM long established in the probabilistic modeling community, and demonstrate its effectiveness on several case studies. These include a machine learning model with multiple causes where there exists a set of latent confounders and a mediator between the causes and the outcome variable, a study where the identifiable causal query cannot be estimated using the front-door or back-door criterion, a case study that captures unobserved crosstalk between two biological signaling pathways, and a COVID-19 expert system that identifies multiple causal queries.
翻译:隐性可变模型(LVMs)是概率模型,其中隐藏了某些变量,培训期间隐藏了其中的一些变量。广泛的LVMs拥有一个定向循环图形结构。定向结构显示数据生成过程的直观因果解释。例如,潜在专题模型表明,主题产生象征性。尽管有这种直观的因果解释,但受过数据培训的定向循环潜在可变模型一般不足以进行因果关系推理,因为所需的模型参数可能不是独特的。在这份手稿中,我们证明LVM可以回答任何因果查询,在培训后,只要能够根据Do-calulus规则从观察到的变量中找出查询。我们表明,因果关系推理可以加强在概率模型界长期建立的LVM大类,并在几个案例研究中展示其有效性。其中包括一个具有多种原因的机器学习模型,在原因和原因之间可能存在一组潜在的根部和调节器。在原因和结果变量之间,一项研究,在其中无法用前门或后门-D标准从所观察到的变量中估算可识别的因果查询。我们证明,一个因果推理推算出多种生物结果分析路径。