We investigate the sensitivity of the Fr\'echet Inception Distance (FID) score to inconsistent and often incorrect implementations across different image processing libraries. FID score is widely used to evaluate generative models, but each FID implementation uses a different low-level image processing process. Image resizing functions in commonly-used deep learning libraries often introduce aliasing artifacts. We observe that numerous subtle choices need to be made for FID calculation and a lack of consistencies in these choices can lead to vastly different FID scores. In particular, we show that the following choices are significant: (1) selecting what image resizing library to use, (2) choosing what interpolation kernel to use, (3) what encoding to use when representing images. We additionally outline numerous common pitfalls that should be avoided and provide recommendations for computing the FID score accurately. We provide an easy-to-use optimized implementation of our proposed recommendations in the accompanying code.
翻译:我们调查Fr\'echet Inception Convention(FID)评分对不同图像处理库之间不一致和往往不正确的执行的敏感度。FID评分被广泛用来评价基因模型,但每个FID的评分都使用不同的低层次图像处理程序。在常用的深层学习图书馆中,图像调整功能常常引入化名文物。我们发现,需要为FID的计算作出许多微妙的选择,这些选择缺乏一致性,可能导致FID的得分大相径庭。特别是,我们表明以下选择很重要:(1) 选择图书馆使用的图像重定用途,(2) 选择内圈,(3) 代表图像时使用什么编码。我们进一步概述了许多应当避免的常见陷阱,并为准确计算FID的评分提供建议。我们为相应代码中的建议提供了易于使用的优化实施。