In real-world image enhancement, it is often challenging (if not impossible) to acquire ground-truth data, preventing the adoption of distance metrics for objective quality assessment. As a result, one often resorts to subjective quality assessment, the most straightforward and reliable means of evaluating image enhancement. Conventional subjective testing requires manually pre-selecting a small set of visual examples, which may suffer from three sources of biases: 1) sampling bias due to the extremely sparse distribution of the selected samples in the image space; 2) algorithmic bias due to potential overfitting the selected samples; 3) subjective bias due to further potential cherry-picking test results. This eventually makes the field of real-world image enhancement more of an art than a science. Here we take steps towards debiasing conventional subjective assessment by automatically sampling a set of adaptive and diverse images for subsequent testing. This is achieved by casting sample selection into a joint maximization of the discrepancy between the enhancers and the diversity among the selected input images. Careful visual inspection on the resulting enhanced images provides a debiased ranking of the enhancement algorithms. We demonstrate our subjective assessment method using three popular and practically demanding image enhancement tasks: dehazing, super-resolution, and low-light enhancement.
翻译:在现实世界的图像提升方面,获取地面真实数据往往具有挑战性(如果不是不可能的话),无法(如果不是不可能的话)获得地面真实数据,从而阻止采用远程衡量标准进行客观质量评估。结果,常常采用主观质量评估,这是评估图像提升的最直接和最可靠的手段。常规主观测试需要人工预选一套小的视觉实例,这可能会有三种偏差:(1)由于在图像空间中所选样本分布极为稀少而造成抽样偏差;(2)由于可能过度配置选定样本而造成算法偏差;(3)由于进一步的潜在樱桃采样测试结果而造成主观偏差。这最终使得现实世界图像提升领域更多地是一种艺术,而不是科学。我们在此采取步骤,通过自动取样一系列适应性和多样性的图像来降低常规主观评估的偏向性,以便随后测试。这是通过将样本选取结果纳入对增强者与选定输入图像之间差异的共同最大化而实现的。对由此产生的强化图像的仔细视觉检查提供了增强算法的分级。我们用三种流行和实际要求高的图像提升任务展示了我们的主观评估方法:降低、超分辨率和超分辨率。