We consider the problem of compressing an information source when a correlated one is available as side information only at the decoder side, which is a special case of the distributed source coding problem in information theory. In particular, we consider a pair of stereo images, which have overlapping fields of view, and are captured by a synchronized and calibrated pair of cameras as correlated image sources. In previously proposed methods, the encoder transforms the input image to a latent representation using a deep neural network, and compresses the quantized latent representation losslessly using entropy coding. The decoder decodes the entropy-coded quantized latent representation, and reconstructs the input image using this representation and the available side information. In the proposed method, the decoder employs a cross-attention module to align the feature maps obtained from the received latent representation of the input image and a latent representation of the side information. We argue that aligning the correlated patches in the feature maps allows better utilization of the side information. We empirically demonstrate the competitiveness of the proposed algorithm on KITTI and Cityscape datasets of stereo image pairs. Our experimental results show that the proposed architecture is able to exploit the decoder-only side information in a more efficient manner compared to previous works.
翻译:我们考虑的是当相关图像仅作为解码器侧面信息时压缩信息源的问题。 解码器是信息理论中分布源编码问题的一个特例。 特别是, 我们考虑的是一副立体图像,这些图像具有重叠的视野领域, 由同步和校准的相片拍摄, 作为相关图像来源。 在先前提议的方法中, 编码器使用深神经网络将输入图像转换成潜在显示层, 并使用昆虫编码将四分层的潜在代表层损失压缩为无损信息 。 解码器解码了昆虫编码的量化潜在代表层代表面, 并利用此表达面和可获得的侧面信息重建输入图像。 在拟议方法中, 解码器使用一个交叉注意模块, 将从输入图像潜在代表面获取的特征图与侧面信息的潜在表示相匹配。 我们认为, 将特征图中的关联部分进行匹配, 能够更好地利用侧面信息。 我们从经验上展示了 KITTI 和 Citycover 数据组的拟议算法的竞争力, 将以往的图像结构进行对比。