With the AI of Things (AIoT) development, a huge amount of visual data, e.g., images and videos, are produced in our daily work and life. These visual data are not only used for human viewing or understanding but also for machine analysis or decision-making, e.g., intelligent surveillance, automated vehicles, and many other smart city applications. To this end, a new image codec paradigm for both human and machine uses is proposed in this work. Firstly, the high-level instance segmentation map and the low-level signal features are extracted with neural networks. Then, the instance segmentation map is further represented as a profile with the proposed 16-bit gray-scale representation. After that, both 16-bit gray-scale profile and signal features are encoded with a lossless codec. Meanwhile, an image predictor is designed and trained to achieve the general-quality image reconstruction with the 16-bit gray-scale profile and signal features. Finally, the residual map between the original image and the predicted one is compressed with a lossy codec, used for high-quality image reconstruction. With such designs, on the one hand, we can achieve scalable image compression to meet the requirements of different human consumption; on the other hand, we can directly achieve several machine vision tasks at the decoder side with the decoded 16-bit gray-scale profile, e.g., object classification, detection, and segmentation. Experimental results show that the proposed codec achieves comparable results as most learning-based codecs and outperforms the traditional codecs (e.g., BPG and JPEG2000) in terms of PSNR and MS-SSIM for image reconstruction. At the same time, it outperforms the existing codecs in terms of the mAP for object detection and segmentation.
翻译:随着对事物的AI(AIoT)开发,大量视觉数据(例如图像和视频)在我们日常工作和生活中生成。这些视觉数据不仅用于人类观看或理解,而且用于机器分析或决策,例如智能监视、自动化车辆和许多其他智能城市应用。为此,在这项工作中提出了一个新的人类和机器用途图像编码范式。首先,通过神经网络提取了高层次实例分解图和低层次信号功能。然后,实例分解图被进一步作为16比位灰度表示的剖面图。之后,16比的灰度剖面图和信号功能被编码成机器分析或决策,例如智能监视、自动化车辆和许多其他智能城市应用软件。为此,在这项工作中,提出了一个新的人类和机器使用的图像编码模式。最后,原图像和预测的图像之间的残余图被压缩为损失代码,用于高品质图像的分类。在其中,通过可比较的图像剖面图中,我们用16比位的平面显示的灰度剖面结果,我们用不同的图像显示不同的图像,在16比值中,我们用不同的图像显示不同的图像,我们用的是,我们用不同的数字的探测器进行不同的图像变变变变的图像,我们用的是,在手的图像中可以实现不同的图像变的图像的图像的图像的图像的变变。