The past few years have witnessed increasing interests in applying deep learning to video compression. However, the existing approaches compress a video frame with only a few number of reference frames, which limits their ability to fully exploit the temporal correlation among video frames. To overcome this shortcoming, this paper proposes a Recurrent Learned Video Compression (RLVC) approach with the Recurrent Auto-Encoder (RAE) and Recurrent Probability Model (RPM). Specifically, the RAE employs recurrent cells in both the encoder and decoder. As such, the temporal information in a large range of frames can be used for generating latent representations and reconstructing compressed outputs. Furthermore, the proposed RPM network recurrently estimates the Probability Mass Function (PMF) of the latent representation, conditioned on the distribution of previous latent representations. Due to the correlation among consecutive frames, the conditional cross entropy can be lower than the independent cross entropy, thus reducing the bit-rate. The experiments show that our approach achieves the state-of-the-art learned video compression performance in terms of both PSNR and MS-SSIM. Moreover, our approach outperforms the default Low-Delay P (LDP) setting of x265 on PSNR, and also has better performance on MS-SSIM than the SSIM-tuned x265 and the slowest setting of x265. The codes are available at https://github.com/RenYang-home/RLVC.git.
翻译:过去几年来,人们越来越有兴趣将深层学习应用到视频压缩中,然而,现有的方法压缩了一个只有少数参照框架的视频框架,限制了它们充分利用视频框架之间时间相关性的能力。为克服这一缺陷,本文件提议与经常自动编码器(RAE)和经常概率模型(RPM)采用经常性视频压缩(RLVC)方法。具体地说,RAE在编码器和解码器中都使用经常性单元格。因此,大量框架范围内的时间信息可用于生成潜在显示和重组压缩输出。此外,拟议的RPM网络经常估计潜在代表的概率质量功能(PMF),以先前潜在代表的分布为条件。由于连续框架之间的相互关系,有条件的交叉昆虫可能低于独立的交叉摄像器,从而降低比特率。实验表明,我们的方法在PSNR和MS-RVIM两个方面都取得了最新经验的视频压缩功能。 此外,我们的方法在目前设置的TRMS-RIS系统/REDRS中, 也比目前设置的默认性色代码要好。