为下一代无线网络安全和加速深强化学习 (Toward Safe and Accelerated Deep Reinforcement Learning for Next-Generation Wireless Networks)

Deep reinforcement learning (DRL) algorithms have recently gained wide attention in the wireless networks domain. They are considered promising approaches for solving dynamic radio resource management (RRM) problems in next-generation networks. Given their capabilities to build an approximate and continuously updated model of the wireless network environments, DRL algorithms can deal with the multifaceted complexity of such environments. Nevertheless, several challenges hinder the practical adoption of DRL in commercial networks. In this article, we first discuss two key practical challenges that are faced but rarely tackled when developing DRL-based RRM solutions. We argue that it is inevitable to address these DRL-related challenges for DRL to find its way to RRM commercial solutions. In particular, we discuss the need to have safe and accelerated DRL-based RRM solutions that mitigate the slow convergence and performance instability exhibited by DRL algorithms. We then review and categorize the main approaches used in the RRM domain to develop safe and accelerated DRL-based solutions. Finally, a case study is conducted to demonstrate the importance of having safe and accelerated DRL-based RRM solutions. We employ multiple variants of transfer learning (TL) techniques to accelerate the convergence of intelligent radio access network (RAN) slicing DRL-based controllers. We also propose a hybrid TL-based approach and sigmoid function-based rewards as examples of safe exploration in DRL-based RAN slicing.

翻译：深度强化学习(DRL)算法最近在无线网络领域得到了广泛的关注,被认为是解决下一代网络动态无线电资源管理问题的有希望的方法,因为DRL有能力建立近似且不断更新的无线网络环境模型,DRL算法可以应对这种环境的多方面复杂性,然而,一些挑战妨碍了商业网络实际采用DRL。在文章中,我们首先讨论了在开发基于DRL的RRM解决方案时所面临的两个关键实际挑战,但很少加以应对。我们争辩说,解决DRL的这些与DRRRM相关的挑战是不可避免的,以便找到RRM的商业解决方案。特别是,我们讨论了需要安全和加速基于DRL的RRRM解决方案,以安全和加速基于DRL的RM相关挑战。我们用基于DRRR的RRM解决方案的多种变式,以缓解DR的缓慢趋同和工作不稳定。我们随后对RRRM的主要方法进行了审查和分类,以DRRM为基础的安全、加速基于DRRM的解决方案。我们用基于RRRRR的S的智能访问和RDRR的RR(RRRRRRRR的调调调调调调调调调调调调调调调的多重方法,我们提议了以智能学习的多种方法,并变式学习S-R-RR-R-R-R-R-R-R-R-R-R-R-R-RR-R-R-R-R-R-R-R-R-R-RL的智能调调调调调调调调调的智能的调调调调的变式做法,我们的智能的调的调的调的甚的甚的甚的甚的甚的甚的变式做法。我们提议了R-R-R-R-R-R-R-R-R-R-R-R-R-R-RDRDRDRL-RL-RL-RL-RL-RL-RL-RDRDRDRDRDR-R-R-R-R-R-R-R-RDR-R-R-R-R-RDRDR-R-R-R-R-R-R-R-R-R-