Safe interaction with the environment is one of the most challenging aspects of Reinforcement Learning (RL) when applied to real-world problems. This is particularly important when unsafe actions have a high or irreversible negative impact on the environment. In the context of network management operations, Remote Electrical Tilt (RET) optimisation is a safety-critical application in which exploratory modifications of antenna tilt angles of base stations can cause significant performance degradation in the network. In this paper, we propose a modular Safe Reinforcement Learning (SRL) architecture which is then used to address the RET optimisation in cellular networks. In this approach, a safety shield continuously benchmarks the performance of RL agents against safe baselines, and determines safe antenna tilt updates to be performed on the network. Our results demonstrate improved performance of the SRL agent over the baseline while ensuring the safety of the performed actions.
翻译:安全与环境互动是应用到现实世界问题的强化学习中最具挑战性的方面之一;当不安全行动对环境产生高或不可逆转的负面影响时,这一点尤其重要;在网络管理操作方面,远程电气倾斜优化是一个安全关键应用,基站天线倾斜角的探索性改造可导致网络的显著性能退化;在本文件中,我们提议一个模块化安全学习架构,然后用于解决手机网络中的可再生能源技术优化问题;在这种方法中,安全屏蔽不断以安全基线为基准衡量RL物剂的性能,并确定网络上的安全天线倾斜更新;我们的成果显示,在确保所执行行动的安全的同时,SRL物剂在基线上的性能有所提高。