Remote Electrical Tilt (RET) optimization is an efficient method for adjusting the vertical tilt angle of Base Stations (BSs) antennas in order to optimize Key Performance Indicators (KPIs) of the network. Reinforcement Learning (RL) provides a powerful framework for RET optimization because of its self-learning capabilities and adaptivity to environmental changes. However, an RL agent may execute unsafe actions during the course of its interaction, i.e., actions resulting in undesired network performance degradation. Since the reliability of services is critical for Mobile Network Operators (MNOs), the prospect of performance degradation has prohibited the real-world deployment of RL methods for RET optimization. In this work, we model the RET optimization problem in the Safe Reinforcement Learning (SRL) framework with the goal of learning a tilt control strategy providing performance improvement guarantees with respect to a safe baseline. We leverage a recent SRL method, namely Safe Policy Improvement through Baseline Bootstrapping (SPIBB), to learn an improved policy from an offline dataset of interactions collected by the safe baseline. Our experiments show that the proposed approach is able to learn a safe and improved tilt update policy, providing a higher degree of reliability and potential for real-world network deployment.
翻译:远程电气倾斜(RET)优化是调整基地站天线垂直倾斜角度以优化网络关键业绩指标(KPIs)的有效方法。强化学习(RL)因其自学能力和适应环境变化的适应性,为RET优化提供了一个强大的框架。然而,RL代理商在其互动过程中可能实施不安全行动,即导致不理想的网络性能退化的行动。由于服务的可靠性对移动网络操作员至关重要,业绩退化的前景已经阻止了RET优化的RL方法在现实世界部署。在这项工作中,我们在安全强化学习(SRL)框架内的RET优化问题模型,目的是学习提供安全基线方面改进性能保障的倾斜控制战略。我们利用最新的SRL方法,即通过基线推进安全政策改进(SPIPBB),从安全基线收集的离线互动数据集中学习更好的政策。我们的实验显示,拟议方法能够学习安全、改进的网络部署潜力,提供更高的水平。