Training and using modern neural-network based latent-variable generative models (like Variational Autoencoders) often require simultaneously training a generative direction along with an inferential(encoding) direction, which approximates the posterior distribution over the latent variables. Thus, the question arises: how complex does the inferential model need to be, in order to be able to accurately model the posterior distribution of a given generative model? In this paper, we identify an important property of the generative map impacting the required size of the encoder. We show that if the generative map is "strongly invertible" (in a sense we suitably formalize), the inferential model need not be much more complex. Conversely, we prove that there exist non-invertible generative maps, for which the encoding direction needs to be exponentially larger (under standard assumptions in computational complexity). Importantly, we do not require the generative model to be layerwise invertible, which a lot of the related literature assumes and isn't satisfied by many architectures used in practice (e.g. convolution and pooling based networks). Thus, we provide theoretical support for the empirical wisdom that learning deep generative models is harder when data lies on a low-dimensional manifold.
翻译:培训和使用基于现代神经网络的隐性可变基因模型(如变式自动编码器)往往需要同时培训一个基因方向和一种推断(编码)方向,该方向与潜在变量的后部分布相近。因此,问题产生:推断模型需要多么复杂,才能准确模拟某一基因模型的后部分布?在本文中,我们确定基因图的一个重要属性,影响编码器的所需大小。我们表明,如果基因图是“强烈不可视”的(在我们适当正规化的意义上),推断模型不需要更加复杂得多。相反,我们证明存在着非不可知的基因图,对于这些图,编码模型的方向需要大得多(根据计算复杂性的标准假设)。重要的是,我们并不要求基因模型具有分层不可知性,许多相关的文献都包含并且我们不为实践中使用的许多结构所满意(例如:在深层次模型和高层次数据基础上,我们提供更深层次的理论模型和高层次的模型支持)。