Maximizing quality of experience (QoE) for interactive video streaming has been a long-standing challenge, as its delay-sensitive nature makes it more vulnerable to bandwidth fluctuations. While reinforcement learning (RL) has demonstrated great potential, existing works are either limited by fixed models or require enormous data/time for online adaptation, which struggle to fit time-varying and diverse network states. Driven by these practical concerns, we perform large-scale measurements on WeChat for Business's interactive video service to study real-world network fluctuations. Surprisingly, our analysis shows that, compared to time-varying network metrics, network sequences exhibit noticeable short-term continuity, sufficient for few-shot learning requirements. We thus propose Fiammetta, the first meta-RL-based bitrate adaptation algorithm for interactive video streaming. Building on the short-term continuity, Fiammetta accumulates learning experiences through offline meta-training and enables fast online adaptation to changing network states through a few gradient updates. Moreover, Fiammetta innovatively incorporates a probing mechanism for real-time monitoring of network states, and proposes an adaptive meta-testing mechanism for seamless adaptation. We implement Fiammetta on a testbed whose end-to-end network follows the real-world WeChat for Business traces. The results show that Fiammetta outperforms prior algorithms significantly, improving video bitrate by 3.6%-16.2% without increasing stalling rate.
翻译:互动视频流的经验最大化(QoE)是一个长期的挑战,因为它的延迟敏感性质使得它更容易受到带宽波动的影响。虽然强化学习(RL)已经展示出巨大的潜力,但现有的工程要么受到固定模型的限制,要么需要巨大的数据/时间用于在线适应,这是为了适应时间变化和不同的网络状态。受这些实际关切的驱使,我们在WeChat上进行大规模测量,用于商业互动视频服务,以研究现实世界网络波动。令人惊讶的是,我们的分析显示,与时间变化的网络衡量标准相比,网络序列具有明显的短期连续性,足以满足少见的学习要求。我们因此建议Fiammetta,这是首个基于元-RL的比特调整算法,用于互动视频流。在短期连续性的基础上,Fiammetta通过离线元培训积累学习经验,并通过一些梯度更新,使网上快速适应改变网络状态。此外,Fiammathatetta创新地纳入了实时监测网络状态的预设机制,并提议一个不作微调的Mex-C前程测试结果,用于无缝调整。