Reducing the data footprint of visual content via image compression is essential to reduce storage requirements, but also to reduce the bandwidth and latency requirements for transmission. In particular, the use of compressed images allows for faster transfer of data, and faster response times for visual recognition in edge devices that rely on cloud-based services. In this paper, we first analyze the impact of image compression using traditional codecs, as well as recent state-of-the-art neural compression approaches, on three visual recognition tasks: image classification, object detection, and semantic segmentation. We consider a wide range of compression levels, ranging from 0.1 to 2 bits-per-pixel (bpp). We find that for all three tasks, the recognition ability is significantly impacted when using strong compression. For example, for segmentation mIoU is reduced from 44.5 to 30.5 mIoU when compressing to 0.1 bpp using the best compression model we evaluated. Second, we test to what extent this performance drop can be ascribed to a loss of relevant information in the compressed image, or to a lack of generalization of visual recognition models to images with compression artefacts. We find that to a large extent the performance loss is due to the latter: by finetuning the recognition models on compressed training images, most of the performance loss is recovered. For example, bringing segmentation accuracy back up to 42 mIoU, i.e. recovering 82% of the original drop in accuracy.
翻译:摘要:
通过图像压缩来减少视觉内容的数据占用是减少存储需求的关键,同时也是减少基于云服务的边缘设备上的数据传输带宽和延迟要求的关键。特别是使用压缩图像可以实现更快的数据传输和更快的视觉识别响应时间。在本文中,我们首先分析传统编解码器以及最新的端到端神经压缩方法对图像识别任务的影响:图像分类、目标检测和语义分割。我们考虑了广泛的压缩级别,从 0.1 到 2 个比特每像素 (bpp)。我们发现,在使用强压缩时,对于所有三个任务,识别能力都会受到显著影响。例如,对于分割任务,当使用我们评估的最佳压缩模型压缩到 0.1 bpp 时,mIoU 从 44.5 降至 30.5 mIoU。其次,我们测试了性能下降的程度是否可以归因于压缩图像中有关信息的丢失,还是由于视觉识别模型对带有压缩伪影的图像的泛化能力不足。我们发现在很大程度上,性能损失是由后者引起的:通过在压缩训练图像上对识别模型进行微调,大部分性能损失都可以恢复。例如,将分割准确度提高到 42 mIoU,即恢复了原始准确度下降的82%。