Tractable models of human perception have proved to be challenging to build. Hand-designed models such as MS-SSIM remain popular predictors of human image quality judgements due to their simplicity and speed. Recent modern deep learning approaches can perform better, but they rely on supervised data which can be costly to gather: large sets of class labels such as ImageNet, image quality ratings, or both. We combine recent advances in information-theoretic objective functions with a computational architecture informed by the physiology of the human visual system and unsupervised training on pairs of video frames, yielding our Perceptual Information Metric (PIM). We show that PIM is competitive with supervised metrics on the recent and challenging BAPPS image quality assessment dataset and outperforms them in predicting the ranking of image compression methods in CLIC 2020. We also perform qualitative experiments using the ImageNet-C dataset, and establish that PIM is robust with respect to architectural details.
翻译:人类感知的可变模型证明是难以构建的。 诸如MS-SSIM等手工设计的模型由于其简便和速度,仍然是人类图像质量判断的流行预测器。 最近的现代深层学习方法可以更好地发挥作用,但它们依靠监督的数据收集成本高昂:如图像网络等大量类标签、图像质量评级或两者兼而有之。 我们把信息理论目标功能的最新进展与由人类视觉系统生理学和未经监督的视频框架对一对视频框架培训所提供信息的计算结构结合起来,产生我们感知信息计量。 我们显示,PIM具有竞争力,与最近具有挑战性的BAPPS图像质量评估数据集的受监督指标相竞争,在预测2020 CLIC图像压缩方法的排名方面优于这些数据。 我们还利用图像网络-C数据集进行定性实验,并证实PIM在建筑细节方面十分健全。