In recent years, the demand of image compression models for machine vision has increased dramatically. However, the training frameworks of image compression still focus on the vision of human, maintaining the excessive perceptual details, thus have limitations in optimally reducing the bits per pixel in the case of performing machine vision tasks. In this paper, we propose Semantic-based Low-bitrate Image compression for Machines by leveraging diffusion, termed SLIM. This is a new effective training framework of image compression for machine vision, using a pretrained latent diffusion model.The compressor model of our method focuses only on the Region-of-Interest (RoI) areas for machine vision in the image latent, to compress it compactly. Then the pretrained Unet model enhances the decompressed latent, utilizing a RoI-focused text caption which containing semantic information of the image. Therefore, SLIM is able to focus on RoI areas of the image without any guide mask at the inference stage, achieving low bitrate when compressing. And SLIM is also able to enhance a decompressed latent by denoising steps, so the final reconstructed image from the enhanced latent can be optimized for the machine vision task while still containing perceptual details for human vision. Experimental results show that SLIM achieves a higher classification accuracy in the same bits per pixel condition, compared to conventional image compression models for machines.


翻译:近年来,机器视觉对图像压缩模型的需求急剧增长。然而,现有的图像压缩训练框架仍主要针对人类视觉设计,保留了过多的感知细节,因此在执行机器视觉任务时难以实现像素比特率的最优降低。本文提出一种基于语义的低比特率图像压缩方法,通过利用扩散模型实现机器视觉优化,称为SLIM。该框架是一种新型高效的机器视觉图像压缩训练方案,采用预训练的潜在扩散模型。本方法的压缩器模型仅聚焦于图像潜在空间中机器视觉感兴趣的区域,以实现紧凑压缩。随后,预训练的Unet模型利用包含图像语义信息的、聚焦于感兴趣区域的文本描述,对解压缩后的潜在表示进行增强。因此,SLIM能够在推理阶段无需任何引导掩码的情况下,专注于图像的感兴趣区域,实现低比特率压缩。同时,SLIM通过去噪步骤增强解压缩后的潜在表示,使得从增强潜在表示重建的最终图像既能针对机器视觉任务进行优化,又保留人类视觉所需的感知细节。实验结果表明,在相同像素比特率条件下,相较于传统的机器视觉图像压缩模型,SLIM实现了更高的分类准确率。

0
下载
关闭预览

相关内容

机器视觉通常用于分析图像,并生成一个对被生成图像物体或场景的描述,这些描述最终用于辅助或决定机器人控制决策。 一门基于计算机图像识别和分析的技术。主要用于自动检测,流程控制或机器人引导等。
Top
微信扫码咨询专知VIP会员