An increasing share of captured images and videos are transmitted for storage and remote analysis by computer vision algorithms, rather than to be viewed by humans. Contrary to traditional standard codecs with engineered tools, neural network based codecs can be trained end-to-end to optimally compress images with respect to a target rate and any given differentiable performance metric. Although it is possible to train such compression tools to achieve better rate-accuracy performance for a particular computer vision task, it could be practical and relevant to re-use the compressed bit-stream for multiple machine tasks. For this purpose, we introduce 'Connectors' that are inserted between the decoder and the task algorithms to enable a direct transformation of the compressed content, which was previously optimized for a specific task, to multiple other machine tasks. We demonstrate the effectiveness of the proposed method by achieving significant rate-accuracy performance improvement for both image classification and object segmentation, using the same bit-stream, originally optimized for object detection.
翻译:越来越多的捕获图像和视频通过计算机视像算法进行存储和远程分析,而不是由人类观看。与传统的标准代码和设计工具相反,神经网络代码可以经过培训,最终到终端都能够优化压缩图像,以达到目标率和任何不同的性能衡量标准。虽然可以对此类压缩工具进行培训,以在特定计算机愿景任务中实现更好的率-准确性能,但将压缩的位流重新用于多重机器任务可能是切合实际和相关的。为此,我们引入了“连接器”,在解码器和任务算法之间插入该“连接器”,以便能够将压缩内容直接转换为其他多项机器任务,而以前为特定任务优化了压缩内容。我们通过在图像分类和对象分割方面实现显著的速-准确性性性能改进,使用相同的位流,最初为对象探测优化的元流,我们展示了拟议方法的有效性。