基础设施辅助连接车辆流动质量-软件深强化学习 (Quality-Aware Deep Reinforcement Learning for Streaming in Infrastructure-Assisted Connected Vehicles)

This paper proposes a deep reinforcement learning-based video streaming scheme for mobility-aware vehicular networks, e.g., vehicles on the highway. We consider infrastructure-assisted and mmWave-based scenarios in which the macro base station (MBS) cannot directly provide the streaming service to vehicles due to the short range of mmWave beams so that small mmWave base stations (mBSs) along the road deliver the desired videos to users. For a smoother streaming service, the MBS proactively pushes video chunks to mBSs. This is done to support vehicles that are currently covered and/or will be by each mBS. We formulate the dynamic video delivery scheme that adaptively determines 1) which content, 2) what quality and 3) how many chunks to be proactively delivered from the MBS to mBSs using Markov decision process (MDP). Since it is difficult for the MBS to track all the channel conditions and the network states have extensive dimensions, we adopt the deep deterministic policy gradient (DDPG) algorithm for the DRL-based video delivery scheme. This paper finally shows that the DRL agent learns a streaming policy that pursues high average quality while limiting packet drops, avoiding playback stalls, reducing quality fluctuations and saving backhaul usage.

翻译：本文建议为机动车辆(例如高速公路上的车辆)网络制定一个深入强化学习的基于学习的视频流方案。我们考虑了基础设施辅助型和基于毫米的动态视频发送方案,宏观基地台(MBS)无法直接为车辆提供流流服务,因为短程的毫米Wave光束使公路上的小型双向基站难以向用户提供所需的视频。为了提供更顺畅的流流服务,MBS积极主动地将视频块推向MBS。这是为支持目前覆盖和(或)每部MBS的车辆而做的。我们制定了动态视频传送方案,根据情况决定了(1)内容、(2)质量和(3)如何利用Markov决定程序(MDP)主动将MBS的块交付给MBS。由于MBS很难跟踪所有频道条件和网络国家具有广泛的维度,我们采用了基于DR的视频传送计划的深度确定性政策梯度(DPG)算法。我们制定了动态视频传送计划,我们制定了动态视频传送计划,以适应性地决定了1)内容、(2)质量和(3)、质量和(3)如何主动从MDRBS到移动系统,同时学习如何降低质量。