Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent perceptual IR algorithms based on generative adversarial networks (GANs) have brought in significant improvement on visual performance, but also pose great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. We present two questions: Can existing IQA methods objectively evaluate recent IR algorithms? With the focus on beating current benchmarks, are we getting better IR algorithms? To answer the questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing ALgorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based IR algorithms, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable Elo system. Based on PIPAL, we present new benchmarks for both IQA and SR methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we shed light on how to improve the IQA performance on GAN-based distortion. Inspired by the find that the existing IQA methods have an unsatisfactory performance on the GAN-based distortion partially because of their low tolerance to spatial misalignment, we propose to improve the performance of an IQA network on GAN-based distortion by explicitly considering this misalignment. We propose the Space Warping Difference Network, which includes the novel l_2 pooling layers and Space Warping Difference layers. Experiments demonstrate the effectiveness of the proposed method.
翻译:图像质量评估( IQA) 是快速开发图像恢复( IR) 算法的关键因素 。 基于基因对抗网络( GANs) 的最新感知的 IR 算法使视觉性能显著改善, 但也给量化评估带来巨大挑战 。 值得注意的是, 我们观察到视觉质量和评价结果之间越来越不一致 。 我们提出两个问题 : 现有的 IQA 方法能否客观评估最近的 IR 算法? 以击败当前基准为重点, 我们正在获得更好的 IR 算法? 为了回答问题和促进 IQA 方法的发展, 我们贡献了大规模 IQA 数据集, 称为 Pervitual 图像处理 ALgorithms (PIPAL) 。 这个数据集包括基于 GAN 的 IR 算法的结果, 而在以前的 IAR 算法中, 我们收集了超过113万个人类的判断书, 用于使用更可靠的 Elo 系统为 PIPAL 图像指定主观分数 。 基于 PIPAL 的 RA, 我们在 IQA 和SR 方法上都提出了新的基准。 我们的 IQA 。 在现有的 IQA 上, 我们的算算算算法中, 也无法按照 IQQQQA 最新的算算出一个新的方法 。