Image quality assessment (IQA) is a fundamental metric for image processing tasks (e.g., compression). With full-reference IQAs, traditional IQAs, such as PSNR and SSIM, have been used. Recently, IQAs based on deep neural networks (deep IQAs), such as LPIPS and DISTS, have also been used. It is known that image scaling is inconsistent among deep IQAs, as some perform down-scaling as pre-processing, whereas others instead use the original image size. In this paper, we show that the image scale is an influential factor that affects deep IQA performance. We comprehensively evaluate four deep IQAs on the same five datasets, and the experimental results show that image scale significantly influences IQA performance. We found that the most appropriate image scale is often neither the default nor the original size, and the choice differs depending on the methods and datasets used. We visualized the stability and found that PieAPP is the most stable among the four deep IQAs.
翻译:图像质量评估( IQA) 是图像处理任务( 如压缩) 的基本衡量标准 。 在完全参考IQAs 的情况下, 传统的IQAs, 如 PSNR 和 SSIM 等传统IQAs 已被使用。 最近, 也使用了基于深神经网络( 深IQAs) 的IQAs, 如 LPIPS 和 DISTS 。 众所周知, 深度的IQAs 之间的图像比例测量不一致, 因为有些在进行预处理时进行下缩缩, 而另一些则使用原始图像大小 。 在本文中, 我们显示图像比例是一个影响深IQA 性能的有影响力的因素 。 我们在同一五个数据集中全面评估了四个深IQAs, 实验结果显示图像比例显著地影响 IQA 的绩效 。 我们发现, 最合适的图像比例通常既不是默认值, 也不是原始大小, 也取决于使用的方法和数据集 。 我们对稳定性进行了直观, 并发现 PieAPP 是四个深 IQA 中最稳定的 。