Fr\'echet Inception Distance (FID) is the primary metric for ranking models in data-driven generative modeling. While remarkably successful, the metric is known to sometimes disagree with human judgement. We investigate a root cause of these discrepancies, and visualize what FID "looks at" in generated images. We show that the feature space that FID is (typically) computed in is so close to the ImageNet classifications that aligning the histograms of Top-$N$ classifications between sets of generated and real images can reduce FID substantially -- without actually improving the quality of results. Thus we conclude that FID is prone to intentional or accidental distortions. As a practical example of an accidental distortion, we discuss a case where an ImageNet pre-trained FastGAN achieves a FID comparable to StyleGAN2, while being worse in terms of human evaluation
翻译:Fr\'echet Ingetive Convention Convention Learth (FID) 是数据驱动基因模型中排名模型的主要衡量标准。 虽然该指标非常成功, 但据知该指标有时与人类判断不尽一致。 我们调查了这些差异的根源, 并想象FID在生成图像中“ 外观” 。 我们显示, FID(通常) 所计算的特征空间非常接近图像网络分类, 使生成和真实图像组之间最高一美元分类的直方图能够大大降低FID -- -- 但没有实际改善结果的质量。 因此, 我们得出结论, FID容易发生有意或意外的扭曲。 作为意外扭曲的一个实例, 我们讨论一个案例,即一个图像网预先训练的FastGAN 实现与StyleGAN2相似的FID, 而在人类评估方面则更糟。