Single-image high dynamic range (SI-HDR) reconstruction has recently emerged as a problem well-suited for deep learning methods. Each successive technique demonstrates an improvement over existing methods by reporting higher image quality scores. This paper, however, highlights that such improvements in objective metrics do not necessarily translate to visually superior images. The first problem is the use of disparate evaluation conditions in terms of data and metric parameters, calling for a standardized protocol to make it possible to compare between papers. The second problem, which forms the main focus of this paper, is the inherent difficulty in evaluating SI-HDR reconstructions since certain aspects of the reconstruction problem dominate objective differences, thereby introducing a bias. Here, we reproduce a typical evaluation using existing as well as simulated SI-HDR methods to demonstrate how different aspects of the problem affect objective quality metrics. Surprisingly, we found that methods that do not even reconstruct HDR information can compete with state-of-the-art deep learning methods. We show how such results are not representative of the perceived quality and that SI-HDR reconstruction needs better evaluation protocols.
翻译:最近,单一图像高动态范围(SI-HDR)的重建作为一个问题出现了,它是一个适合深层学习方法的问题。每个连续的技术都通过报告高图像质量分数,表明现有方法的改进。然而,本文强调,客观指标的改进不一定转化为高视觉图像。第一个问题是在数据和基准参数方面使用不同的评价条件,要求制定标准化的协议,以便能够对文件进行比较。构成本文件主要重点的第二个问题是评估SI-HDR重建的内在困难,因为重建问题的某些方面占了目标差异,从而引入了偏差。在这里,我们用现有的和模拟的SI-HDR方法复制了典型的评价,以表明问题的不同方面如何影响客观质量指标。奇怪的是,我们发现,甚至没有重建人类发展报告信息的方法都与最先进的深层次学习方法竞争。我们发现,这种结果如何不能代表人们所认为的质量,SI-HDR重建需要更好的评价程序。