Learned video compression methods have demonstrated great promise in catching up with traditional video codecs in their rate-distortion (R-D) performance. However, existing learned video compression schemes are limited by the binding of the prediction mode and the fixed network framework. They are unable to support various inter prediction modes and thus inapplicable for various scenarios. In this paper, to break this limitation, we propose a versatile learned video compression (VLVC) framework that uses one model to support all possible prediction modes. Specifically, to realize versatile compression, we first build a motion compensation module that applies multiple 3D motion vector fields (i.e., voxel flows) for weighted trilinear warping in spatial-temporal space. The voxel flows convey the information of temporal reference position that helps to decouple inter prediction modes away from framework designing. Secondly, in case of multiple-reference-frame prediction, we apply a flow prediction module to predict accurate motion trajectories with a unified polynomial function. We show that the flow prediction module can largely reduce the transmission cost of voxel flows. Experimental results demonstrate that our proposed VLVC not only supports versatile compression in various settings but also achieves comparable R-D performance with the latest VVC standard in terms of MS-SSIM.
翻译:视频压缩方法显示,在超速扭曲(R-D)性能中追赶传统视频编码器方面很有希望。然而,现有的已学视频压缩方案受到预测模式和固定网络框架的约束而受到限制。它们无法支持各种预测模式,因而无法适用于各种设想方案。在本文件中,为了打破这一限制,我们提议了一个多功能的视频压缩(VLVC)框架,使用一种模型来支持所有可能的预测模式。具体地说,为了实现多功能压缩,我们首先为空间时空空间空间加权三线对流应用多维运动矢量字段(即 voxel 流)来建立一个运动补偿模块。实验结果显示,我们提议的VLVS-DIM 标准性能不仅支持了在空间-时空空间流中可比较性能,还表明我们提议的VLVS-DIM 标准性能在空间-RLVC 标准性能设置中不仅支持各种可比较性能。