Learning-based video compression has been extensively studied over the past years, but it still has limitations in adapting to various motion patterns and entropy models. In this paper, we propose multi-mode video compression (MMVC), a block wise mode ensemble deep video compression framework that selects the optimal mode for feature domain prediction adapting to different motion patterns. Proposed multi-modes include ConvLSTM-based feature domain prediction, optical flow conditioned feature domain prediction, and feature propagation to address a wide range of cases from static scenes without apparent motions to dynamic scenes with a moving camera. We partition the feature space into blocks for temporal prediction in spatial block-based representations. For entropy coding, we consider both dense and sparse post-quantization residual blocks, and apply optional run-length coding to sparse residuals to improve the compression rate. In this sense, our method uses a dual-mode entropy coding scheme guided by a binary density map, which offers significant rate reduction surpassing the extra cost of transmitting the binary selection map. We validate our scheme with some of the most popular benchmarking datasets. Compared with state-of-the-art video compression schemes and standard codecs, our method yields better or competitive results measured with PSNR and MS-SSIM.
翻译:学习的视频压缩在过去几年中得到了广泛的研究,但在适应不同的运动模式和熵模型方面仍存在一定局限性。在本文中,我们提出了多模式视频压缩(MMVC),一种块模式集成深度视频压缩框架,它通过选择最优模式以适应不同的运动模式进行特征域预测。提出的多模式包括ConvLSTM-based特征域预测、光流条件的特征域预测和特征传播,以解决从没有明显运动的静态场景到具有移动相机的动态场景的各种情况。我们将特征空间分块以在空间块表示中进行时间预测。对于熵编码,我们考虑了密集和稀疏的量化后残差块,并对稀疏残差应用可选的游程编码以提高压缩率。在这种意义上,我们的方法使用由二进制密度图引导的双模熵编码方案,这提供了显著的速率降低,超过了传输二进制选择映射的额外成本。我们使用一些最受欢迎的基准数据集验证了我们的方案。与最先进的视频压缩方案和标准编解码器相比,我们的方法表现出更好或竞争力的结果,这些结果是通过PSNR和MS-SSIM等指标衡量的。