Several sixth generation (6G) use cases have tight requirements in terms of reliability and latency, in particular teleoperated driving (TD). To address those requirements, Predictive Quality of Service (PQoS), possibly combined with reinforcement learning (RL), has emerged as a valid approach to dynamically adapt the configuration of the TD application (e.g., the level of compression of automotive data) to the experienced network conditions. In this work, we explore different classes of RL algorithms for PQoS, namely MAB (stateless), SARSA (stateful on-policy), Q-Learning (stateful off-policy), and DSARSA and DDQN (with Neural Network (NN) approximation). We trained the agents in a federated learning (FL) setup to improve the convergence time and fairness, and to promote privacy and security. The goal is to optimize the trade-off between Quality of Service (QoS), measured in terms of the end-to-end latency, and Quality of Experience (QoE), measured in terms of the quality of the resulting compression operation. We show that Q-Learning uses a small number of learnable parameters, and is the best approach to perform PQoS in the TD scenario in terms of average reward, convergence, and computational cost.
翻译:暂无翻译