Recent work has shown that learned image compression strategies can outperform standard hand-crafted compression algorithms that have been developed over decades of intensive research on the rate-distortion trade-off. With growing applications of computer vision, high quality image reconstruction from a compressible representation is often a secondary objective. Compression that ensures high accuracy on computer vision tasks such as image segmentation, classification, and detection therefore has the potential for significant impact across a wide variety of settings. In this work, we develop a framework that produces a compression format suitable for both human perception and machine perception. We show that representations can be learned that simultaneously optimize for compression and performance on core vision tasks. Our approach allows models to be trained directly from compressed representations, and this approach yields increased performance on new tasks and in low-shot learning settings. We present results that improve upon segmentation and detection performance compared to standard high quality JPGs, but with representations that are four to ten times smaller in terms of bits per pixel. Further, unlike naive compression methods, at a level ten times smaller than standard JEPGs, segmentation and detection models trained from our format suffer only minor degradation in performance.
翻译:最近的工作表明,学习到的图像压缩战略可以比几十年来对速度扭曲交易的密集研究所制定的标准手动压缩算法更完善。随着计算机视觉应用的增加,从压缩代表制中进行高质量图像重建往往是一个次要目标。因此,确保图像分割、分类和检测等计算机视觉任务高度精确的压缩战略有可能在各种环境中产生重大影响。在这项工作中,我们开发了一个框架,产生一种既适合人类感知又适合机器感知的压缩格式。我们表明,可以同时了解压缩和核心视觉任务绩效的最佳表达方式。我们的方法使模型能够直接从压缩代表制中接受培训,这一方法能够提高新任务和低速学习环境中的绩效。我们提出的结果是,与标准的高质量JPG相比,分解和检测性能得到改善,但每个像素的比特小四至十倍。此外,与天性压缩方法不同,我们从格式中培训的模型在标准 JEGPG、分解和检测模式的十倍水平上只受到轻微的退化。