Recently, introspective models like IntroVAE and S-IntroVAE have excelled in image generation and reconstruction tasks. The principal characteristic of introspective models is the adversarial learning of VAE, where the encoder attempts to distinguish between the real and the fake (i.e., synthesized) images. However, due to the unavailability of an effective metric to evaluate the difference between the real and the fake images, the posterior collapse and the vanishing gradient problem still exist, reducing the fidelity of the synthesized images. In this paper, we propose a new variation of IntroVAE called Adversarial Similarity Distance Introspective Variational Autoencoder (AS-IntroVAE). We theoretically analyze the vanishing gradient problem and construct a new Adversarial Similarity Distance (AS-Distance) using the 2-Wasserstein distance and the kernel trick. With weight annealing on AS-Distance and KL-Divergence, the AS-IntroVAE are able to generate stable and high-quality images. The posterior collapse problem is addressed by making per-batch attempts to transform the image so that it better fits the prior distribution in the latent space. Compared with the per-image approach, this strategy fosters more diverse distributions in the latent space, allowing our model to produce images of great diversity. Comprehensive experiments on benchmark datasets demonstrate the effectiveness of AS-IntroVAE on image generation and reconstruction tasks.
翻译:最近,IntroVAE 和 S-IntroVAE 等内窥模型在图像生成和重建任务中取得了卓越的成绩。内窥模型的主要特征是VAE的对抗性学习,其中编码器试图将真实图像和假图像(即合成的)图像区分开来。然而,由于缺乏有效的衡量标准来评价真实图像和假图像之间的差异,后视镜崩溃和渐渐消失的梯度问题仍然存在,降低了合成图像的忠诚度。在本文中,我们提出了IntroVAE 的新的变异,称为“反反向相似性远程内窥动自动转换器(AS-IntroVAE ) 。我们从理论上分析了消失的梯度问题,并用2-Wasserstein距离和内心操纵器来构建新的反向相近距离(AS-Distance)模型。在AS-Disl和KL-Diverence Reference上, AS-IntroVAE 能够生成稳定且高质量的图像的图像转换前图像。我们图像的图像的图像的图像的图像的生成将更稳定、更精确的图像的图像进行更精确的图像的图像的图像的翻缩化。我们通过对图像的图像的图像的图像的图像的变现的图像的图像的图像的图像的图像的图像的变现的图像的图像的图像的图像的图像的图像的图像的处理方式,通过对它的图像的图像的图像的变。