Objective image quality evaluation is a challenging task, which aims to measure the quality of a given image automatically. According to the availability of the reference images, there are Full-Reference and No-Reference IQA tasks, respectively. Most deep learning approaches use regression from deep features extracted by Convolutional Neural Networks. For the FR task, another option is conducting a statistical comparison on deep features. For all these methods, non-local information is usually neglected. In addition, the relationship between FR and NR tasks is less explored. Motivated by the recent success of transformers in modeling contextual information, we propose a unified IQA framework that utilizes CNN backbone and transformer encoder to extract features. The proposed framework is compatible with both FR and NR modes and allows for a joint training scheme. Evaluation experiments on three standard IQA datasets, i.e., LIVE, CSIQ and TID2013, and KONIQ-10K, show that our proposed model can achieve state-of-the-art FR performance. In addition, comparable NR performance is achieved in extensive experiments, and the results show that the NR performance can be leveraged by the joint training scheme.
翻译:目标图像质量评估是一项具有挑战性的任务,目的是自动测量特定图像的质量。根据参考图像的可用性,我们分别提出全参考和无参考IQA任务。大多数深层次学习方法使用革命神经网络所提取的深层特征的回归。对于FR任务,另一个选项是对深层特征进行统计比较。对于所有这些方法,非本地信息通常被忽略。此外,由于FR和NR任务之间的关系探索较少。受最近变压器成功建模背景信息的影响,我们提议一个统一的IQA框架,利用CNN骨干和变压器编码器来提取特征。拟议框架与FR和NR模式兼容,并允许联合培训计划。关于IQA三个标准数据集的评价实验,即LIVE、CSIQ和TID2013和KONIQ-10K,表明我们提议的模型能够实现FR的状态性能。此外,在广泛的实验中实现了可比较的NR性能,结果显示NR业绩可以通过联合培训计划加以利用。