Recent years have witnessed the dramatic growth of Internet video traffic, where the video bitstreams are often compressed and delivered in low quality to fit the streamer's uplink bandwidth. To alleviate the quality degradation, it comes the rise of Neural-enhanced Video Streaming (NVS), which shows great prospects for recovering low-quality videos by mostly deploying neural super-resolution (SR) on the media server. Despite its benefit, we reveal that current mainstream works with SR enhancement have not achieved the desired rate-distortion trade-off between bitrate saving and quality restoration, due to: (1) overemphasizing the enhancement on the decoder side while omitting the co-design of encoder, (2) limited generative capacity to recover high-fidelity perceptual details, and (3) optimizing the compression-and-restoration pipeline from the resolution perspective solely, without considering color bit-depth. Aiming at overcoming these limitations, we are the first to conduct an encoder-decoder (i.e., codec) synergy by leveraging the inherent visual-generative property of diffusion models. Specifically, we present the Codec-aware Diffusion Modeling (CaDM), a novel NVS paradigm to significantly reduce streaming delivery bitrates while holding pretty higher restoration capacity over existing methods. First, CaDM improves the encoder's compression efficiency by simultaneously reducing resolution and color bit-depth of video frames. Second, CaDM empowers the decoder with high-quality enhancement by making the denoising diffusion restoration aware of encoder's resolution-color conditions. Evaluation on public cloud services with OpenMMLab benchmarks shows that CaDM effectively saves up to 5.12 - 21.44 times bitrates based on common video standards and achieves much better recovery quality (e.g., FID of 0.61) over state-of-the-art neural-enhancing methods.
翻译:近些年来,互联网视频流量急剧增长, 视频比特流通常被压缩, 且以低质量交付, 以适应流体的上链带。 为了减轻质量退化, 出现了神经强化视频流(NVS)的崛起, 这显示了恢复低质量视频的巨大前景, 主要是在媒体服务器上部署神经超分辨率(SR) 。 尽管它的好处, 我们发现, 当前主流中SL的提升并没有达到比特节储蓄和质量恢复之间所期望的速率扭曲交易, 原因是:(1) 过度强调解码器侧的增强,同时忽略了编码器的共同设计,(2) 模型增强的感知性能以恢复高感性细节细节,(3) 仅从解析角度优化压缩和存储管道, 而不考虑颜色深度。 为了克服这些限制, 我们首先在比特节率储蓄- 变异性节( e., coc) 协同力, 利用内在的可视化比代数比值的比值改善的比值 数据流 恢复能力, 我们代码- dal- demodeal ladeal ladeal ladeal ladeal ladeal lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax lax laxx lax lax lax laxxxxxxx laxxxxxxxxxxxxxxxxxxxxxxxxxxxx lax dal lax d d d lax lax dal lax d laxx lax lax d lax d lax d lax lax lax lax laxxxxxxxxxxxxxxxxxxxxxxx laxxxxxx </s>