Safe interaction with the environment is one of the most challenging aspects of Reinforcement Learning (RL) when applied to real-world problems. This is particularly important when unsafe actions have a high or irreversible negative impact on the environment. In the context of network management operations, Remote Electrical Tilt (RET) optimisation is a safety-critical application in which exploratory modifications of antenna tilt angles of Base Stations (BSs) can cause significant performance degradation in the network. In this paper, we propose a modular Safe Reinforcement Learning (SRL) architecture which is then used to address the RET optimisation in cellular networks. In this approach, a safety shield continuously benchmarks the performance of RL agents against safe baselines, and determines safe antenna tilt updates to be performed on the network. Our results demonstrate improved performance of the SRL agent over the baseline while ensuring the safety of the performed actions.
 翻译:安全与环境互动是适用于现实世界问题的强化学习最具有挑战性的方面之一,当不安全行动对环境产生高或不可逆转的负面影响时,这一点尤其重要。在网络管理操作方面,远程电气倾斜优化是一个安全关键应用,其中基地站天线倾斜角的探索性改造可导致网络的显著性能退化。在本文件中,我们提议了一个模块化的安全学习架构,用于解决移动电话网络中的可再生能源技术优化问题。在这种方法中,安全屏蔽持续将RL代理物的性能与安全基线挂钩,确定网络上的安全天线倾斜更新。我们的结果显示,在基线上,SRL代理物的性能有所改善,同时确保所执行行动的安全性。