In the realm of short video streaming, popular adaptive bitrate (ABR) algorithms developed for classical long video applications suffer from catastrophic failures because they are tuned to solely adapt bitrates. Instead, short video adaptive bitrate (SABR) algorithms have to properly determine which video at which bitrate level together for content prefetching, without sacrificing the users' quality of experience (QoE) and yielding noticeable bandwidth wastage jointly. Unfortunately, existing SABR methods are inevitably entangled with slow convergence and poor generalization. Thus, in this paper, we propose Incendio, a novel SABR framework that applies Multi-Agent Reinforcement Learning (MARL) with Expert Guidance to separate the decision of video ID and video bitrate in respective buffer management and bitrate adaptation agents to maximize the system-level utilized score modeled as a compound function of QoE and bandwidth wastage metrics. To train Incendio, it is first initialized by imitating the hand-crafted expert rules and then fine-tuned through the use of MARL. Results from extensive experiments indicate that Incendio outperforms the current state-of-the-art SABR algorithm with a 53.2% improvement measured by the utility score while maintaining low training complexity and inference time.
翻译:在短视频流媒体领域,为长视频应用程序开发的流行自适应比特率(ABR)算法由于仅调整比特率而遭受灾难性失败。相反,短视频自适应比特率(SABR)算法必须正确确定预取哪个比特率级别的视频,而不会牺牲用户体验质量(QoE)并共同产生明显的带宽浪费。不幸的是,现有的SABR方法不可避免地与收敛缓慢和泛化能力差纠缠在一起。因此,在本文中,我们提出了Incendio,一种新颖的SABR框架,它应用多智体强化学习(MARL)与专家指导,将视频ID和视频比特率的决定分别用于缓冲区管理和比特率自适应代理,从而将系统级利用分数最大化,该利用分数被建模为QoE和带宽浪费度量值的复合函数。为了训练Incendio,它首先通过模仿手工制作的专家规则进行初始化,然后通过使用MARL进行微调。广泛实验的结果表明,Incendio在保持低训练复杂度和推理时间的同时,优于当前最先进的SABR算法,衡量利用度分数的改进为53.2%。