Assessing the performance of Generative Adversarial Networks (GANs) has been an important topic due to its practical significance. Although several evaluation metrics have been proposed, they generally assess the quality of the whole generated image distribution. For Reference-guided Image Synthesis (RIS) tasks, i.e., rendering a source image in the style of another reference image, where assessing the quality of a single generated image is crucial, these metrics are not applicable. In this paper, we propose a general learning-based framework, Reference-guided Image Synthesis Assessment (RISA) to quantitatively evaluate the quality of a single generated image. Notably, the training of RISA does not require human annotations. In specific, the training data for RISA are acquired by the intermediate models from the training procedure in RIS, and weakly annotated by the number of models' iterations, based on the positive correlation between image quality and iterations. As this annotation is too coarse as a supervision signal, we introduce two techniques: 1) a pixel-wise interpolation scheme to refine the coarse labels, and 2) multiple binary classifiers to replace a na\"ive regressor. In addition, an unsupervised contrastive loss is introduced to effectively capture the style similarity between a generated image and its reference image. Empirical results on various datasets demonstrate that RISA is highly consistent with human preference and transfers well across models.
翻译:评估基因反影网络(GANs)的性能是一个重要的专题,因为其具有实际意义。虽然提出了若干评价指标,但它们一般地评估了整个生成图像分布的质量。对于参考指导图像合成(RIS)任务,即以另一种参考图像的风格制作来源图像,评估单一生成图像的质量至关重要,这些指标不适用。我们在本文件中提议了一个基于学习的一般框架,即参考指导图像合成评估(RISA),以定量评价单一生成图像的质量。值得注意的是,RISA的培训不需要人文说明。具体地说,RISA的培训数据由RIS培训程序的中间模型获得,而模型的图像序列数则以其数量为弱化说明,根据图像质量和迭代相之间的正相关关系,这种说明过于复杂,因此我们引入了两种技术:(1) 以比等分解的内插方法,以完善不精确的标签质量。(2) 以多种双向结构的分类,即以高清晰的比值取代了模型生成的图象。