Images fed to a deep neural network have in general undergone several handcrafted image signal processing (ISP) operations, all of which have been optimized to produce visually pleasing images. In this work, we investigate the hypothesis that the intermediate representation of visually pleasing images is sub-optimal for downstream computer vision tasks compared to the RAW image representation. We suggest that the operations of the ISP instead should be optimized towards the end task, by learning the parameters of the operations jointly during training. We extend previous works on this topic and propose a new learnable operation that enables an object detector to achieve superior performance when compared to both previous works and traditional RGB images. In experiments on the open PASCALRAW dataset, we empirically confirm our hypothesis.
翻译:提供给深神经网络的图像一般经过了几次手工制作的图像信号处理操作,所有这些都经过优化以生成视觉上令人愉快的图像。在这项工作中,我们调查了一种假设,即与RAW图像显示相比,视觉上令人愉快的图像的中间表示对于下游计算机的视觉任务来说并不理想。我们建议,在培训期间,通过共同学习操作的参数,使ISP的操作在完成最终任务时最优化。我们延长了以前关于这个主题的工程,并提出了一种新的可学习操作,使物体探测器能够实现与以往工程和传统 RGB 图像相比的优异性。在开放的 PCALRAW 数据集实验中,我们从经验上证实了我们的假设。