Traditional image/video compression aims to reduce the transmission/storage cost with signal fidelity as high as possible. However, with the increasing demand for machine analysis and semantic monitoring in recent years, semantic fidelity rather than signal fidelity is becoming another emerging concern in image/video compression. With the recent advances in cross modal translation and generation, in this paper, we propose the cross modal compression~(CMC), a semantic compression framework for visual data, to transform the high redundant visual data~(such as image, video, etc.) into a compact, human-comprehensible domain~(such as text, sketch, semantic map, attributions, etc.), while preserving the semantic. Specifically, we first formulate the CMC problem as a rate-distortion optimization problem. Secondly, we investigate the relationship with the traditional image/video compression and the recent feature compression frameworks, showing the difference between our CMC and these prior frameworks. Then we propose a novel paradigm for CMC to demonstrate its effectiveness. The qualitative and quantitative results show that our proposed CMC can achieve encouraging reconstructed results with an ultrahigh compression ratio, showing better compression performance than the widely used JPEG baseline.
翻译:传统图像/视频压缩的目的是尽可能降低信号忠诚度的传输/存储成本,然而,随着近年来对机器分析和语义监测的需求不断增加,语义忠诚性而不是信号忠诚性正在成为图像/视频压缩方面的另一个新出现的关注问题。由于最近在跨模式翻译和生成方面的最新进展,我们在本文件中提议采用跨模式压缩~(CMC),这是视觉数据的一个语义压缩框架,将高冗余视觉数据~(如图像、视频等)转化为一个紧凑的、人理解的域(如文字、素描、语义图、属性等),同时保留语义。具体地说,我们首先将CMC问题设计成一个比例扭曲性优化问题。第二,我们调查与传统图像/视频压缩和最近特征压缩框架的关系,显示我们的CMC与这些先前框架之间的差异。然后我们提出一个新的CMC模式,以展示其有效性。质量和数量结果显示,我们提议的CMC能够以超高压缩率的基压率率来鼓励重建结果,显示比广泛使用的GEB更好的压度业绩。