Adaptive video streaming relies on the construction of efficient bitrate ladders to deliver the best possible visual quality to viewers under bandwidth constraints. The traditional method of content dependent bitrate ladder selection requires a video shot to be pre-encoded with multiple encoding parameters to find the optimal operating points given by the convex hull of the resulting rate-quality curves. However, this pre-encoding step is equivalent to an exhaustive search process over the space of possible encoding parameters, which causes significant overhead in terms of both computation and time expenditure. To reduce this overhead, we propose a deep learning based method of content aware convex hull prediction. We employ a recurrent convolutional network (RCN) to implicitly analyze the spatiotemporal complexity of video shots in order to predict their convex hulls. A two-step transfer learning scheme is adopted to train our proposed RCN-Hull model, which ensures sufficient content diversity to analyze scene complexity, while also making it possible capture the scene statistics of pristine source videos. Our experimental results reveal that our proposed model yields better approximations of the optimal convex hulls, and offers competitive time savings as compared to existing approaches. On average, the pre-encoding time was reduced by 58.0% by our method, while the average Bjontegaard delta bitrate (BD-rate) of the predicted convex hulls against ground truth was 0.08%, while the mean absolute deviation of the BD-rate distribution was 0.44%
翻译:适应性视频流取决于高效比特拉梯的构建,以在带宽限制下为观众提供尽可能最佳的视觉质量。传统的内容依赖比特拉梯选择方法要求用视频拍摄,以预先编码多个编码参数,以找到由此形成的速率质量曲线的螺旋柱体给出的最佳操作点。然而,这一编前步骤相当于在可能的编码参数空间上进行彻底搜索的过程,从而在计算和时间支出方面造成大量间接费用。为了减少这一间接费用,我们建议了基于深度学习的内容了解绝对螺旋螺旋船体预测值的方法。我们使用一个经常性的演动网络(RCN),以隐含分析视频镜头的波形时复杂度,以预测其曲线质量曲线的轮廓。采用了两步转移学习计划来培训我们提议的RCN-Hull模型,该模型确保足够的内容多样性来分析场面复杂性,同时能够捕捉原始来源视频的现场统计数据。我们的实验结果显示,我们提议的模型在了解最优化的螺旋螺旋船体船体尾的绝对值预测值预测值时,我们使用一个经常的周期网络网络(RCN-D平均节节节节节节节节节节节比比比B节节节节节节节节节节节节差率比比比前的汇率降低的递减)。