Variational autoencoders (VAEs) are a popular class of deep generative models with many variants and a wide range of applications. Improvements upon the standard VAE mostly focus on the modelling of the posterior distribution over the latent space and the properties of the neural network decoder. In contrast, improving the model for the observational distribution is rarely considered and typically defaults to a pixel-wise independent categorical or normal distribution. In image synthesis, sampling from such distributions produces spatially-incoherent results with uncorrelated pixel noise, resulting in only the sample mean being somewhat useful as an output prediction. In this paper, we aim to stay true to VAE theory by improving the samples from the observational distribution. We propose SOS-VAE, an alternative model for the observation space, encoding spatial dependencies via a low-rank parameterisation. We demonstrate that this new observational distribution has the ability to capture relevant covariance between pixels, resulting in spatially-coherent samples. In contrast to pixel-wise independent distributions, our samples seem to contain semantically-meaningful variations from the mean allowing the prediction of multiple plausible outputs with a single forward pass.
翻译:VAE 标准 VAE 改进主要侧重于潜空和神经网络解码器特性的后部分布模型。相反,改进观测分布模型很少被考虑,通常默认为像素独立的独立绝对值或正常分布。在图像合成中,通过这种分布的取样产生与不相干像素不相容的结果,产生与空间相联的像素噪音,结果只有样本才有用处,作为输出预测。在本文件中,我们力求通过改进观测分布的样本来保持VAE理论的正确性。我们提议SOSVAE,一种观察空间的替代模型,通过低调参数将空间依赖性与像素相匹配。我们证明,这种新的观测分布能够捕捉像素之间的相关共变异性,从而产生空间相异的样本。与正态独立分配的不同,我们的样本似乎含有一种可信的前方输出。