Image quality assessment (IQA) algorithm aims to quantify the human perception of image quality. Unfortunately, there is a performance drop when assessing the distortion images generated by generative adversarial network (GAN) with seemingly realistic texture. In this work, we conjecture that this maladaptation lies in the backbone of IQA models, where patch-level prediction methods use independent image patches as input to calculate their scores separately, but lack spatial relationship modeling among image patches. Therefore, we propose an Attention-based Hybrid Image Quality Assessment Network (AHIQ) to deal with the challenge and get better performance on the GAN-based IQA task. Firstly, we adopt a two-branch architecture, including a vision transformer (ViT) branch and a convolutional neural network (CNN) branch for feature extraction. The hybrid architecture combines interaction information among image patches captured by ViT and local texture details from CNN. To make the features from shallow CNN more focused on the visually salient region, a deformable convolution is applied with the help of semantic information from the ViT branch. Finally, we use a patch-wise score prediction module to obtain the final score. The experiments show that our model outperforms the state-of-the-art methods on four standard IQA datasets and AHIQ ranked first on the Full Reference (FR) track of the NTIRE 2022 Perceptual Image Quality Assessment Challenge.
翻译:图像质量评估( IQA) 算法旨在量化人类对图像质量的看法。 不幸的是, 在评估基因对抗网络( GAN) 生成的扭曲图像时, 在评估基于 GAN 的IQA 任务时出现性能下降。 在这项工作中, 我们推测这种不适应存在于IQA 模型的骨干中, 即补丁级预测方法使用独立图像补丁作为输入来分别计算其分数, 但缺少图像补丁之间的空间关系模型。 因此, 我们提议建立一个基于关注的混合图像质量评估网络( AHIQ ), 以应对基于 GAN 的 IQA 任务的挑战, 并获得更好的性能。 首先, 我们采用双排结构架构结构, 包括一个视觉变异性变异器( VIT) 分支和一个变异性神经网络( CNNN) 分支, 来提取特征提取。 混合结构将 VIT 所捕获的图像补丁和CNNCN 的本地文本细节结合起来。 因此, 浅级CN 更侧重于 可见可见可见的 Perual Q II 的图像评分 4 。 我们使用了对I 级 A 标准 的评分 的评分 。