The performance of visual quality prediction models is commonly assumed to be closely tied to their ability to capture perceptually relevant image aspects. Models are thus either based on sophisticated feature extractors carefully designed from extensive domain knowledge or optimized through feature learning. In contrast to this, we find feature extractors constructed from random noise to be sufficient to learn a linear regression model whose quality predictions reach high correlations with human visual quality ratings, on par with a model with learned features. We analyze this curious result and show that besides the quality of feature extractors also their quantity plays a crucial role - with top performances only being achieved in highly overparameterized models.
翻译:通常认为,视觉质量预测模型的性能与其捕捉感知相关图像的能力密切相关,因此,模型要么基于从广泛领域知识中仔细设计的精密地物提取器,要么通过特征学习优化。与此相反,我们发现,通过随机噪音制造的地物提取器足以学习线性回归模型,其质量预测与人类视觉质量评级高度相关,与具有学习特征的模型相同。我们分析了这一令人好奇的结果,并表明,除了地物提取器的质量外,其数量也发挥着关键作用,而顶级性能只能在高度超分化模型中实现。