With the wide application of stereo images in various fields, the research on stereo image compression (SIC) attracts extensive attention from academia and industry. The core of SIC is to fully explore the mutual information between the left and right images and reduce redundancy between views as much as possible. In this paper, we propose DispSIC, an end-to-end trainable deep neural network, in which we jointly train a stereo matching model to assist in the image compression task. Based on the stereo matching results (i.e. disparity), the right image can be easily warped to the left view, and only the residuals between the left and right views are encoded for the left image. A three-branch auto-encoder architecture is adopted in DispSIC, which encodes the right image, the disparity map and the residuals respectively. During training, the whole network can learn how to adaptively allocate bitrates to these three parts, achieving better rate-distortion performance at the cost of a lower disparity map bitrates. Moreover, we propose a conditional entropy model with aligned cross-view priors for SIC, which takes the warped latents of the right image as priors to improve the accuracy of the probability estimation for the left image. Experimental results demonstrate that our proposed method achieves superior performance compared to other existing SIC methods on the KITTI and InStereo2K datasets both quantitatively and qualitatively.
翻译:由于在各个领域广泛应用立体图像,关于立体图像压缩(SIC)的研究吸引了学术界和工业界的广泛关注。SIC的核心是充分探索左侧和右侧图像之间的相互信息,并尽可能减少观点之间的冗余。在本文件中,我们提议DispSIC,即一个端到端的、可训练的深神经网络,我们在这个网络中共同培训一个立体匹配模型,以协助图像压缩任务。根据立体匹配结果(即差异),右面图像很容易扭曲到左面视图,只有左面和右面视图之间的剩余部分才被编码为左面图像。在DispSIC中采用三处自动编码自动编码结构,并尽可能减少视图之间的冗余。在DispSIC中,我们建议采用一个三处的自动编码自动编码结构,将右面图像、差异图和剩余部分分别编码。在培训期间,整个网络可以学习如何适应这三个部分的比特仪,以较低的差异图位比数为代价实现更好的率扭曲性表现。此外,我们提议用一个有条件的矩模型模型,对左面和右面视图对立面视图进行编码。在SICSICSIC中采用三处的精确度模型,以比前,以显示现有图像的实验性结果,以显示现有的实验性结果,以显示现有KSICIFI结果,以显示现有的实验性能结果。在试验方法,以显示其他图像。在试验性结果,以显示其他图像。在左面结果,以显示现有图像。