We propose a novel deep neural network (DNN) architecture for compressing an image when a correlated image is available as side information only at the decoder side, a special case of the well-known and heavily studied distributed source coding (DSC) problem. In particular, we consider a pair of stereo images, which have overlapping fields of view, captured by a synchronized and calibrated pair of cameras; and therefore, are highly correlated. We assume that one image of the pair is to be compressed and transmitted, while the other image is available only at the decoder. In the proposed architecture, the encoder maps the input image to a latent space using a DNN, quantizes the latent representation, and compresses it losslessly using entropy coding. The proposed decoder extracts useful information common between the images solely from the available side information, as well as a latent representation of the side information. Then, the latent representations of the two images, one received from the encoder, the other extracted locally, along with the locally generated common information, are fed to the respective decoders of the two images. We employ a cross-attention module (CAM) to align the feature maps obtained in the intermediate layers of the respective decoders of the two images, thus allowing better utilization of the side information. We train and demonstrate the effectiveness of the proposed algorithm on various realistic setups, such as KITTI and Cityscape datasets of stereo image pairs. Our results show that the proposed architecture is capable of exploiting the decoder-only side information in a more efficient manner as it outperforms previous works. We also show that the proposed method is able to provide significant gains even in the case of uncalibrated and unsynchronized camera array use cases.
翻译:我们建议建立一个新型的深神经网络(DNN)架构以压缩图像。 当一个相关图像仅作为侧面信息只存在于解码器侧面时, 我们建议建立一个新型的深神经网络( DNN) 架构, 以压缩一个图像压缩。 在提议的架构中, 编码器将输入图像映射到一个隐蔽空间, 使用 DNN, 量化潜在显示, 并且用加密编码器编码不折不扣地压缩它。 特别是, 我们考虑一对立体图像, 其视野有重叠之处, 由一组同步校准的摄影机拍摄; 因此, 我们考虑一对立体图像的暗层显示将压缩, 而另一对立体则与本地生成的普通信息一起, 将输入到两个图像侧面的侧面映射器, 将潜在表示潜在代表面的显示方式( 将先前的图像的平面显示方式进行更精确的平整) 。 因此, 将先前的解码解析器将显示前两个图像的图层显示为更精确的图层。