The primary challenge of video streaming is to balance high video quality with smooth playback. Traditional codecs are well tuned for this trade-off, yet their inability to use context means they must encode the entire video data and transmit it to the client. This paper introduces ELVIS (End-to-end Learning-based VIdeo Streaming Enhancement Pipeline), an end-to-end architecture that combines server-side encoding optimizations with client-side generative in-painting to remove and reconstruct redundant video data. Its modular design allows ELVIS to integrate different codecs, inpainting models, and quality metrics, making it adaptable to future innovations. Our results show that current technologies achieve improvements of up to 11 VMAF points over baseline benchmarks, though challenges remain for real-time applications due to computational demands. ELVIS represents a foundational step toward incorporating generative AI into video streaming pipelines, enabling higher quality experiences without increased bandwidth requirements.
翻译:视频流的主要挑战在于平衡高视频质量与流畅播放。传统编解码器已针对此权衡进行了良好优化,但其无法利用上下文信息,意味着必须对整个视频数据进行编码并传输至客户端。本文介绍了ELVIS(端到端学习型视频流增强管道),这是一种结合服务器端编码优化与客户端生成式修复技术以移除并重建冗余视频数据的端到端架构。其模块化设计使ELVIS能够集成不同的编解码器、修复模型和质量评估指标,从而适应未来的技术革新。我们的实验结果表明,当前技术相较于基准测试可实现高达11个VMAF点的提升,但由于计算需求,实时应用仍面临挑战。ELVIS代表了将生成式人工智能融入视频流处理管道的基础性进展,能够在无需增加带宽的前提下实现更高质量的用户体验。