Nowadays, many of the images captured are "observed" by machines only and not by humans, for example, robots' or autonomous cars' cameras. High-level machine vision models, such as object recognition, assume images are transformed to some canonical image space by the camera ISP. However, the camera ISP is optimized for producing visually pleasing images to human observers and not for machines, thus, one may spare the ISP compute time and apply the vision models directly to the raw data. Yet, it has been shown that training such models directly on the RAW images results in a performance drop. To mitigate this drop in performance (without the need to annotate RAW data), we use a dataset of RAW and RGB image pairs, which can be easily acquired with no human labeling. We then train a model that is applied directly to the RAW data by using knowledge distillation such that the model predictions for RAW images will be aligned with the predictions of an off-the-shelf pre-trained model for processed RGB images. Our experiments show that our performance on RAW images is significantly better than a model trained on labeled RAW images. It also reasonably matches the predictions of a pre-trained model on processed RGB images, while saving the ISP compute overhead.
翻译:目前,许多摄取的图像只由机器而不是由人类“观察”,例如机器人或自动汽车的相机。高层次的机器视觉模型,例如物体识别,假设图像由相机 ISP 转换为某些金色图像空间。然而,摄影机 ISP 被优化用于为人类观察者而不是机器制作视觉上令人愉快的图像,因此,可以让 ISP 计算时间,直接将视觉模型应用到原始数据中。然而,已经显示,在RAW 图像上直接培训这种模型可以导致性能下降。为了减轻这种性能下降(不需要说明RAW 数据),我们使用RAW 和 RGB 图像组合的数据集,这些数据集可以很容易地在没有人类标签的情况下获得。然后我们用知识蒸馏法将模型应用于RAW 数据直接应用,这样, RAW 图像的模型预测将会与处理过的 RGB 图像的现成前训练模型的预测相一致。我们的RAW 模型实验显示,我们RAW 图像的性能大大好于RAW 和 RGB 图像的升级前的模型。