In VP9 video codec, the sizes of blocks are decided during encoding by recursively partitioning 64$\times$64 superblocks using rate-distortion optimization (RDO). This process is computationally intensive because of the combinatorial search space of possible partitions of a superblock. Here, we propose a deep learning based alternative framework to predict the intra-mode superblock partitions in the form of a four-level partition tree, using a hierarchical fully convolutional network (H-FCN). We created a large database of VP9 superblocks and the corresponding partitions to train an H-FCN model, which was subsequently integrated with the VP9 encoder to reduce the intra-mode encoding time. The experimental results establish that our approach speeds up intra-mode encoding by 69.7% on average, at the expense of a 1.71% increase in the Bjontegaard-Delta bitrate (BD-rate). While VP9 provides several built-in speed levels which are designed to provide faster encoding at the expense of decreased rate-distortion performance, we find that our model is able to outperform the fastest recommended speed level of the reference VP9 encoder for the good quality intra encoding configuration, in terms of both speedup and BD-rate.
翻译:在 VP9 视频编码器中,区块的大小在编码过程中由使用速率扭曲优化(RDO)对速率扭曲优化(RDO)对64个超级区块进行递归分割,64美元乘64个超级区块进行计算。由于超级区块可能分区的组合搜索空间,这一过程是计算密集的。在这里,我们提出一个基于深层次学习的替代框架,以四层分区树的形式预测模式内超级区块的大小,使用等级完全同步的网络(H-FCN)。我们创建了一个大型的VP9超级区块和相应的分区数据库,用于培训H-FCN模型,该模型随后与VP9编码器合并,以减少元件内部编码时间。实验结果表明,我们的方法平均加快了69.7%的元件元件内部编码,费用为Bjontegaard-Deltabitrate(B-D-rate)增加1.71%。虽然VP9提供了一些内部速度水平,目的是提供更快的编码速度,以降低速调控速度参考水平为代价,我们建议B9 级加速的公式质量。