解开图像统计和人类感知之间的联系 (Disentangling the Link Between Image Statistics and Human Perception)

In the 1950s Horace Barlow and Fred Attneave suggested a connection between sensory systems and how they are adapted to the environment: early vision evolved to maximise the information it conveys about incoming signals. Following Shannon's definition, this information was described using the probability of the images taken from natural scenes. Previously, direct accurate predictions of image probabilities were not possible due to computational limitations. Despite the exploration of this idea being indirect, mainly based on oversimplified models of the image density or on system design methods, these methods had success in reproducing a wide range of physiological and psychophysical phenomena. In this paper, we directly evaluate the probability of natural images and analyse how it may determine perceptual sensitivity. We employ image quality metrics that correlate well with human opinion as a surrogate of human vision, and an advanced generative model to directly estimate the probability. Specifically, we analyse how the sensitivity of full-reference image quality metrics can be predicted from quantities derived directly from the probability distribution of natural images. First, we compute the mutual information between a wide range of probability surrogates and the sensitivity of the metrics and find that the most influential factor is the probability of the noisy image. Then we explore how these probability surrogates can be combined using a simple model to predict the metric sensitivity, giving an upper bound for the correlation of 0.85 between the model predictions and the actual perceptual sensitivity. Finally, we explore how to combine the probability surrogates using simple expressions, and obtain two functional forms (using one or two surrogates) that can be used to predict the sensitivity of the human visual system given a particular pair of images.

翻译：在20世纪50年代，霍勒斯·巴洛和弗雷德·阿特尼夫提出了感觉系统与它们如何适应环境之间的联系：早期视觉进化成了最大化其对传入信号所传达的信息。根据香农的定义，这些信息使用自然场景中图像的概率来描述。由于计算的局限性，以前不能直接准确地预测图像概率。尽管这个想法的探索是间接的，主要基于图像密度的过度简化模型或系统设计方法，但这些方法在重现广泛的生理和心理物理现象方面取得了成功。在本文中，我们直接评估自然图像的概率，并分析它如何决定感知灵敏度。我们采用与人类意见高度相关的图像质量度量作为人类视觉的替代，以及先进的生成模型来直接估计概率。具体而言，我们分析由自然图像的概率分布直接导出的量如何可以预测感度。首先，我们计算一系列概率代理（probabilistic surrogates）与度量灵敏度的相关性的互信息，并发现最有影响力的因素是噪声图像的概率。然后，我们探索如何使用简单模型结合这些概率代理来预测度量灵敏度，给出模型预测与实际感知灵敏度之间最大相关性为0.85的上限。最后，我们探讨如何使用简单的表达式来组合概率代理，并获得两个功能形式（使用一个或两个代理），可以用于给定一对图像时预测人类视觉系统的感度。