Conventional cameras capture image irradiance on a sensor and convert it to RGB images using an image signal processor (ISP). The images can then be used for photography or visual computing tasks in a variety of applications, such as public safety surveillance and autonomous driving. One can argue that since RAW images contain all the captured information, the conversion of RAW to RGB using an ISP is not necessary for visual computing. In this paper, we propose a novel $\rho$-Vision framework to perform high-level semantic understanding and low-level compression using RAW images without the ISP subsystem used for decades. Considering the scarcity of available RAW image datasets, we first develop an unpaired CycleR2R network based on unsupervised CycleGAN to train modular unrolled ISP and inverse ISP (invISP) models using unpaired RAW and RGB images. We can then flexibly generate simulated RAW images (simRAW) using any existing RGB image dataset and finetune different models originally trained for the RGB domain to process real-world camera RAW images. We demonstrate object detection and image compression capabilities in RAW-domain using RAW-domain YOLOv3 and RAW image compressor (RIC) on snapshots from various cameras. Quantitative results reveal that RAW-domain task inference provides better detection accuracy and compression compared to RGB-domain processing. Furthermore, the proposed \r{ho}-Vision generalizes across various camera sensors and different task-specific models. Additional advantages of the proposed $\rho$-Vision that eliminates the ISP are the potential reductions in computations and processing times.
翻译:常规相机在传感器上摄取图像辐照, 并使用图像信号处理器( ISP) 将图像转换成 RGB 图像。 图像随后可用于各种应用的摄影或视觉计算任务, 如公共安全监视和自主驱动等 。 人们可以争辩说, 由于 RAW 图像包含所有采集的信息, 使用 ISP 将 RAW 转换成 RGB 并不必要进行视觉计算 。 在本文中, 我们提出一个新的 $\ rho$- Vision 框架, 以在没有图像信号的传感器上进行高层次的语义理解和低水平压缩 。 考虑到可用的 RAW 图像数据集的稀缺性, 我们首先开发一个基于不受监控的 CycellGAN 的不光学循环R2R2RRRRRRR网络网络网络网络, 用于培训RGB RGB 常规图像的模拟和微量级缩略图 。 我们展示了在IMRAW IML 图像中进行更精确的图像检测和压缩功能。