Recent years have witnessed an increasing interest in end-to-end learned video compression. Most previous works explore temporal redundancy by detecting and compressing a motion map to warp the reference frame towards the target frame. Yet, it failed to adequately take advantage of the historical priors in the sequential reference frames. In this paper, we propose an Advanced Learned Video Compression (ALVC) approach with the in-loop frame prediction module, which is able to effectively predict the target frame from the previously compressed frames, \textit{without consuming any bit-rate}. The predicted frame can serve as a better reference than the previously compressed frame, and therefore it benefits the compression performance. The proposed in-loop prediction module is a part of the end-to-end video compression and is jointly optimized in the whole framework. We propose the recurrent and the bi-directional in-loop prediction modules for compressing P-frames and B-frames, respectively. The experiments show the state-of-the-art performance of our ALVC approach in learned video compression. We also outperform the default hierarchical B mode of x265 in terms of PSNR and beat the slowest mode of the SSIM-tuned x265 on MS-SSIM. The project page: https://github.com/RenYang-home/ALVC.
翻译:近些年来,人们对端到端学习的视频压缩的兴趣日益浓厚。 大多数前几部作品都通过探测和压缩移动图将参考框架压缩到目标框架,探索了时间冗余。 然而,它未能充分利用顺序参照框架的历史前科。 在本文中,我们建议采用高级学习视频压缩(ALVC)方法,使用在环形框架预测模块,该模块能够有效地从先前的压缩框中预测目标框架,\ textit{而不消耗任何比特率}。 预测的框架可以比以往压缩框架更好地作为参考,从而有利于压缩性能。 拟议的在轨预测模块是端到端视频压缩的一部分,并在整个框架中共同优化。 我们提出了用于压缩P-框架和B-框架的双向线内线预测模块。 实验显示我们的ALVC方法在学习视频压缩中的状态表现。 我们还超越了在 SS-C/REMA 的x265级的默认B模式: AS-RIM/REB 慢速的MAS-RIS/SIM/SIM/SLSB/SLSBSBSB/SBSBSBSB/SLMSBSB/RSBSBSLM/RM/RSBSBSBSBSBSBSB/RSBSB/RRSBSB/RRMRRSBS/BSBSBSBSBSB/BS/BS/BS/BSBSBSB/BSBSBSBSBSBSBSBSBSBSBSBSBS/B/B/BSBSBSDRS/BSDRMS/BSMSMSMS/PRMS/BS/BSBSBSD/PF/P/S/P/BSMSMSMSDRMSDRMSDRMSDRSD/BSBSDF/PF/PF/PF/PF/BS/BS/PF/PF/BS/BSMSMS/BSMSMSMS/BS/PF/BS/BS