Recent years have witnessed the dramatic growth of Internet video traffic, where the video bitstreams are often compressed and delivered in low quality to fit the streamer's uplink bandwidth. To alleviate the quality degradation, it comes the rise of Neural-enhanced Video Streaming (NVS), which shows great prospects to recover low-quality videos by mostly deploying neural super-resolution (SR) on the media server. Despite its benefit, we reveal that current mainstream works with SR enhancement have not achieved the desired rate-distortion trade-off between bitrate saving and quality restoration, due to: (1) overemphasizing the enhancement on the decoder side while omitting the co-design of encoder, (2) inherent limited restoration capacity to generate high-fidelity perceptual details, and (3) optimizing the compression-and-restoration pipeline from the resolution perspective solely, without considering color bit-depth. Aiming at overcoming these limitations, we are the first to conduct the encoder-decoder (i.e., codec) synergy by leveraging the visual-synthesis genius of diffusion models. Specifically, we present the Codec-aware Diffusion Modeling (CaDM), a novel NVS paradigm to significantly reduce streaming delivery bitrate while holding pretty higher restoration capacity over existing methods. First, CaDM improves the encoder's compression efficiency by simultaneously reducing resolution and color bit-depth of video frames. Second, CaDM provides the decoder with perfect quality enhancement by making the denoising diffusion restoration aware of encoder's resolution-color conditions. Evaluation on public cloud services with OpenMMLab benchmarks shows that CaDM significantly saves streaming bitrate by a nearly 100 times reduction over vanilla H.264 and achieves much better recovery quality (e.g., FID of 0.61) over state-of-the-art neural-enhancing methods.
翻译:近些年来,互联网视频流量急剧增长, 视频位流往往被压缩, 且以低质量交付, 以适应流流的上行带宽。 为了缓解质量退化, 出现了神经强化视频流(NVS)的崛起, 这显示了通过在媒体服务器上主要部署神经超分辨率(SR)来恢复低质量视频的巨大前景。 尽管它的好处很大, 我们发现, 目前主流中SL的提升并没有在比特节储蓄和质量恢复之间实现理想的速率扭曲交易, 原因是:(1) 过度强调解译器侧的增强,同时忽略了编码器的共同设计,(2) 模型本身有限的恢复能力来产生高真实性视频流细节,(3) 仅仅从解析角度优化压缩和再恢复管道, 而不考虑颜色深度。 为了克服这些限制, 我们首先在比特节节节节节储蓄( e. ccocrc) 上进行电解析的升级, 通过利用视觉变异化机的精度传播, 使得流流流流流的精度降低当前BDDRD的精度交付能力。