To train Variational Autoencoders (VAEs) to generate realistic imagery requires a loss function that reflects human perception of image similarity. We propose such a loss function based on Watson's perceptual model, which computes a weighted distance in frequency space and accounts for luminance and contrast masking. We extend the model to color images, increase its robustness to translation by using the Fourier Transform, remove artifacts due to splitting the image into blocks, and make it differentiable. In experiments, VAEs trained with the new loss function generated realistic, high-quality image samples. Compared to using the Euclidean distance and the Structural Similarity Index, the images were less blurry; compared to deep neural network based losses, the new approach required less computational resources and generated images with less artifacts.
翻译:要培训变异自动编码器(VAEs)以生成现实图像,就需要一种反映人类对图像相似感感的丢失功能。 我们根据华生的感知模型提出这样的损失功能,该模型计算频率空间的加权距离,并计算亮度和对比面遮罩。 我们将该模型扩展至彩色图像,通过使用Fourier变形来增加其坚固度,通过将图像拆分成块块来将其转化为变形,并使它具有差异性。 在实验中,受新损失功能培训的VAEs生成了现实的高质量图像样本。 与使用Euclidean距离和结构相似性指数相比,这些图像不太模糊; 与基于深度神经网络的损耗相比,新的方法需要较少计算资源,用较少的人工制品生成图像。