Generative models can be trained to emulate complex empirical data, but are they useful to make predictions in the context of previously unobserved environments? An intuitive idea to promote such extrapolation capabilities is to have the architecture of such model reflect a causal graph of the true data generating process, such that one can intervene on each node independently of the others. However, the nodes of this graph are usually unobserved, leading to overparameterization and lack of identifiability of the causal structure. We develop a theoretical framework to address this challenging situation by defining a weaker form of identifiability, based on the principle of independence of mechanisms. We demonstrate on toy examples that classical stochastic gradient descent can hinder the model's extrapolation capabilities, suggesting independence of mechanisms should be enforced explicitly during training. Experiments on deep generative models trained on real world data support these insights and illustrate how the extrapolation capabilities of such models can be leveraged.
翻译:生成模型可以被训练来仿效复杂的实验性数据,但是它们对于在以前没有观察到的环境背景下作出预测是否有用?促进这种外推能力的直觉想法是,这种模型的结构要反映真实数据生成过程的因果图,以便人们可以独立地对每个节点进行干涉。然而,这个图的节点通常没有观测到,导致因果结构的过分分化和缺乏可辨性。我们根据机制独立性的原则,通过界定较弱的可识别性形式,来制定一个理论框架来应对这一具有挑战性的情况。我们通过简单的例子证明,传统的随机梯度梯度梯度梯度梯度梯度梯度梯度梯度可阻碍模型的外推能力,并建议在培训期间应明确实施机制的独立性。关于根据真实世界数据培训的深层基因化模型的实验支持了这些洞察力,并说明了如何利用这些模型的外推能力。