Deep generative models have demonstrated successful applications in learning non-linear data distributions through a number of latent variables and these models use a nonlinear function (generator) to map latent samples into the data space. On the other hand, the nonlinearity of the generator implies that the latent space shows an unsatisfactory projection of the data space, which results in poor representation learning. This weak projection, however, can be addressed by a Riemannian metric, and we show that geodesics computation and accurate interpolations between data samples on the Riemannian manifold can substantially improve the performance of deep generative models. In this paper, a Variational spatial-Transformer AutoEncoder (VTAE) is proposed to minimize geodesics on a Riemannian manifold and improve representation learning. In particular, we carefully design the variational autoencoder with an encoded spatial-Transformer to explicitly expand the latent variable model to data on a Riemannian manifold, and obtain global context modelling. Moreover, to have smooth and plausible interpolations while traversing between two different objects' latent representations, we propose a geodesic interpolation network different from the existing models that use linear interpolation with inferior performance. Experiments on benchmarks show that our proposed model can improve predictive accuracy and versatility over a range of computer vision tasks, including image interpolations, and reconstructions.
翻译:深度生成模型通过多个潜变量学习非线性数据分布,并使用非线性函数(生成器)将潜样本映射到数据空间。然而,生成器的非线性意味着潜空间显示出对数据空间的不良投影,这导致表示学习效果不佳。然而,通过利用黎曼度量可以解决此弱投影问题,我们展示在黎曼流形上计算测地线和准确插值可大幅改善深度生成模型的性能。本文提出了一种带有编码器空间变换器的变分黎曼流形自编码器行(VTAE),以最小化在黎曼流形上的测地线和改善表示学习。特别的,我们精心设计了变分自动编码器,并使用编码器空间变换器明确将潜变量模型扩展到黎曼流形上的数据,以获得全局上下文建模。此外,为了在穿越两个不同对象的潜表示时实现平滑和合理的插值,我们提出了一个新颖的测地线插值网络,与现有的性能较差的线性插值网络不同。实验表明,我们的模型可以在一系列计算机视觉任务,包括图像插值和重建等方面提高预测精确性和适应性。