This work evaluates the robustness of quality measures of generative models such as Inception Score (IS) and Fr\'echet Inception Distance (FID). Analogous to the vulnerability of deep models against a variety of adversarial attacks, we show that such metrics can also be manipulated by additive pixel perturbations. Our experiments indicate that one can generate a distribution of images with very high scores but low perceptual quality. Conversely, one can optimize for small imperceptible perturbations that, when added to real world images, deteriorate their scores. We further extend our evaluation to generative models themselves, including the state of the art network StyleGANv2. We show the vulnerability of both the generative model and the FID against additive perturbations in the latent space. Finally, we show that the FID can be robustified by simply replacing the standard Inception with a robust Inception. We validate the effectiveness of the robustified metric through extensive experiments, showing it is more robust against manipulation.
翻译:这项工作评估了感知分数(IS)和Fr\'echet感知距离(FID)等基因模型质量计量的稳健性。关于深度模型在各种对抗性攻击面前的脆弱性,我们表明,这类计量也可以被添加像素扰动作用所操纵。我们的实验表明,人们可以生成非常高分但感知质量低的图像分布。相反,人们可以优化小的感知性扰动,在加入真实世界的图像时,其分数会恶化。我们进一步将我们的评价扩大到基因模型本身,包括艺术网络StyleGANv2的状况。我们展示了基因模型和FID在防止潜在空间的添加性扰动方面的脆弱性。最后,我们表明,光是用强健的感知觉取代标准即可使FID变得稳健。我们通过广泛的实验验证了强度测量的有效性,表明它更能抵御操纵。