In this paper, we present CAESR, an hybrid learning-based coding approach for spatial scalability based on the versatile video coding (VVC) standard. Our framework considers a low-resolution signal encoded with VVC intra-mode as a base-layer (BL), and a deep conditional autoencoder with hyperprior (AE-HP) as an enhancement-layer (EL) model. The EL encoder takes as inputs both the upscaled BL reconstruction and the original image. Our approach relies on conditional coding that learns the optimal mixture of the source and the upscaled BL image, enabling better performance than residual coding. On the decoder side, a super-resolution (SR) module is used to recover high-resolution details and invert the conditional coding process. Experimental results have shown that our solution is competitive with the VVC full-resolution intra coding while being scalable.
翻译:在本文中,我们介绍了基于多功能视频编码(VVC)标准的空间可扩展性混合基于学习的编码方法CAESR。我们的框架考虑一种低分辨率信号,该信号以VVC内部模式编码为基层,并使用高分辨率(AE-HP)的深度有条件自动编码器作为增强级模式。EL编码器将升级的BL重建和原始图像作为投入。我们的方法依赖于有条件的编码,该编码可以学习源的最佳混合物和升级的BL图像,从而比剩余编码更好性能。在解码器侧,使用超级分辨率模块来恢复高分辨率细节,并扭转有条件的编码过程。实验结果表明,我们的解决方案在可缩放的同时与VVC全分辨率内部编码具有竞争力。