Many generative models synthesize data by transforming a standard Gaussian random variable using a deterministic neural network. Among these models are the Variational Autoencoders and the Generative Adversarial Networks. In this work, we call them "push-forward" models and study their expressivity. We show that the Lipschitz constant of these generative networks has to be large in order to fit multimodal distributions. More precisely, we show that the total variation distance and the Kullback-Leibler divergence between the generated and the data distribution are bounded from below by a constant depending on the mode separation and the Lipschitz constant. Since constraining the Lipschitz constants of neural networks is a common way to stabilize generative models, there is a provable trade-off between the ability of push-forward models to approximate multimodal distributions and the stability of their training. We validate our findings on one-dimensional and image datasets and empirically show that generative models consisting of stacked networks with stochastic input at each step, such as diffusion models do not suffer of such limitations.
翻译:许多基因模型通过使用确定性神经网络转换标准高斯随机变量来合成数据。 这些模型包括变异自动电解器和基因反转网络。 在这项工作中,我们称它们为“推向”模型并研究其表达性。 我们显示,为了适应多式联运分布,这些基因网络的利普西茨常数必须很大才能适应多式分布。 更准确地说,我们显示,生成的数据和数据分布之间的总变差距离和 Kullback-Leabler 差异, 受一个常数的约束, 该常数取决于模式分离和Lipschitz 常数。 由于限制神经网络的利普西茨常数是稳定基因模型的常见方法,因此在推向前模型接近多式联运分布的能力和其培训的稳定性之间有一个可察觉的交换点。 我们验证了我们关于一维和图像数据集的调查结果,并用经验显示,由每步步骤都含有随机输入的堆叠网络组成的基因模型,例如扩散模型没有受到这种限制。