Wasserstein autoencoder (WAE) shows that matching two distributions is equivalent to minimizing a simple autoencoder (AE) loss under the constraint that the latent space of this AE matches a pre-specified prior distribution. This latent space distribution matching is a core component of WAE, and a challenging task. In this paper, we propose to use the contrastive learning framework that has been shown to be effective for self-supervised representation learning, as a means to resolve this problem. We do so by exploiting the fact that contrastive learning objectives optimize the latent space distribution to be uniform over the unit hyper-sphere, which can be easily sampled from. We show that using the contrastive learning framework to optimize the WAE loss achieves faster convergence and more stable optimization compared with existing popular algorithms for WAE. This is also reflected in the FID scores on CelebA and CIFAR-10 datasets, and the realistic generated image quality on the CelebA-HQ dataset.
翻译:Vasserstein 自动编码器(WAE) 显示,匹配两个分布相当于将简单的自动编码器(AE)损失最小化,其限制条件是,该自动编码器的潜在空间与预先指定的先前分布相匹配。这种潜在的空间分布匹配是WAE的核心组成部分,也是一项具有挑战性的任务。在本文中,我们提议使用对比式学习框架,这一框架已证明对自我监督的代表学习有效,以此解决这一问题。我们这样做的办法是利用以下事实,即对比式学习目标优化潜在空间分布,使其与单位超视距一致,而该超视距可以很容易取样。我们表明,使用对比式学习框架优化WAE损失与WAE现有流行算法相比,可以更快和更加稳定地优化。这也反映在国际开发公司在CeebA和CIFAR-10数据集上的评分中,以及在CeebA-HQ数据集上实际生成的图像质量。