Providing quality-constant streams can simultaneously guarantee user experience and prevent wasting bit-rate. In this paper, we propose a novel deep learning based two-pass encoder parameter prediction framework to decide rate factor (RF), with which encoder can output streams with constant quality. For each one-shot segment in a video, the proposed method firstly extracts spatial, temporal and pre-coding features by an ultra fast pre-process. Based on these features, a RF parameter is predicted by a deep neural network. Video encoder uses the RF to compress segment as the first encoding pass. Then VMAF quality of the first pass encoding is measured. If the quality doesn't meet target, a second pass RF prediction and encoding will be performed. With the help of first pass predicted RF and corresponding actual quality as feedback, the second pass prediction will be highly accurate. Experiments show the proposed method requires only 1.55 times encoding complexity on average, meanwhile the accuracy, that the compressed video's actual VMAF is within $\pm1$ around the target VMAF, reaches 98.88%.
翻译:提供质量链流可以同时保证用户经验并防止浪费比特率。 在本文中, 我们提出一个新的基于二通编码器参数预测框架, 以决定速率系数( RF), 编码器可以在其中以恒定质量输出流。 对于视频中的每个片段, 拟议方法首先通过超快的预处理提取空间、 时间和预编码特征。 基于这些特征, 一个深层神经网络可以预测 RF 参数 。 视频编码器使用 RF 压缩部分作为第一个编码通道 。 然后测量第一个密码编码的 VMAF 质量。 如果质量达不到目标, 将进行第二通RF 预测和编码 。 在第一传预测RF 的帮助下, 第二传预测将是非常准确的。 实验显示, 拟议方法只需要平均1.55倍的编码复杂度, 同时精确度, 压缩的视频在目标VMAF 周围的 $\ pm1 范围内, 达98.88% 。