机器感知驱动的图像压缩：一种分层生成方法 (Machine Perception-Driven Image Compression: A Layered Generative Approach)

In this age of information, images are a critical medium for storing and transmitting information. With the rapid growth of image data amount, visual compression and visual data perception are two important research topics attracting a lot attention. However, those two topics are rarely discussed together and follow separate research path. Due to the compact compressed domain representation offered by learning-based image compression methods, there exists possibility to have one stream targeting both efficient data storage and compression, and machine perception tasks. In this paper, we propose a layered generative image compression model achieving high human vision-oriented image reconstructed quality, even at extreme compression ratios. To obtain analysis efficiency and flexibility, a task-agnostic learning-based compression model is proposed, which effectively supports various compressed domain-based analytical tasks while reserves outstanding reconstructed perceptual quality, compared with traditional and learning-based codecs. In addition, joint optimization schedule is adopted to acquire best balance point among compression ratio, reconstructed image quality, and downstream perception performance. Experimental results verify that our proposed compressed domain-based multi-task analysis method can achieve comparable analysis results against the RGB image-based methods with up to 99.6% bit rate saving (i.e., compared with taking original RGB image as the analysis model input). The practical ability of our model is further justified from model size and information fidelity aspects.

翻译：在这个信息时代，图像是存储和传输信息的关键媒介。随着图像数据量的快速增长，视觉压缩和视觉数据感知是两个引起了很多关注的重要研究课题。然而，这两个课题很少同时讨论，而是跟随不同的研究路线。由于基于学习的图像压缩方法提供了紧凑的压缩域表示，存在将一个流同时针对高效数据存储和压缩，以及机器感知任务的可能性。在本文中，我们提出了一种分层生成图像压缩模型，实现了高人类视觉导向的图像重建质量，即使在极端压缩比下也是如此。为了获得分析效率和灵活性，我们提出了一种任务不可知的基于学习的压缩模型，有效地支持各种基于压缩域的分析任务，同时与传统和基于学习的编解码器相比，保留了出色的重建感知质量。此外，采用联合优化调度来在压缩比、重建图像质量和下游感知性能之间获得最佳平衡点。实验结果验证了我们提出的基于压缩域的多任务分析方法可以在与基于RGB图像的方法相比的情况下实现可比的分析结果，最多可节省99.6%的比特率（即与原始RGB图像作为分析模型输入相比）。我们的模型的实际能力从模型大小和信息保真度方面得到了进一步的验证。