In this paper, reinforcement learning (RL) for network slicing is considered in NextG radio access networks, where the base station (gNodeB) allocates resource blocks (RBs) to the requests of user equipments and aims to maximize the total reward of accepted requests over time. Based on adversarial machine learning, a novel over-the-air attack is introduced to manipulate the RL algorithm and disrupt NextG network slicing. The adversary observes the spectrum and builds its own RL based surrogate model that selects which RBs to jam subject to an energy budget with the objective of maximizing the number of failed requests due to jammed RBs. By jamming the RBs, the adversary reduces the RL algorithm's reward. As this reward is used as the input to update the RL algorithm, the performance does not recover even after the adversary stops jamming. This attack is evaluated in terms of both the recovery time and the (maximum and total) reward loss, and it is shown to be much more effective than benchmark (random and myopic) jamming attacks. Different reactive and proactive defense schemes (protecting the RL algorithm's updates or misleading the adversary's learning process) are introduced to show that it is viable to defend NextG network slicing against this attack.
翻译:在本文中,NeG无线电接入网络考虑网络切片的强化学习(RL),基站(gNodeB)向用户设备的要求分配资源区块(RBs),目的是在一段时间内对已接受的请求给予最大程度的奖励。根据对抗性机器学习,引入新的超空攻击来操纵RL算法并干扰Neg网络切片。敌人观察频谱,建立基于RL的代金模型,选择哪些RBs在能源预算范围内进行干扰,目的是最大限度地增加因卡住的RBs而导致的未成功请求的数量。通过干扰RBs,敌人减少了RL算法的奖励。由于这一奖励被用来作为更新RL算法的投入,即使在敌人停止干扰后,业绩也不会恢复。这次攻击从恢复时间和(最大和总)奖励损失的角度进行评估,并且证明它比基准(随机和 Myopic)干扰袭击的效果要大得多。不同的反应性和主动性防御计划(保护RL算法的下一个反射性网络)正在向反射性地更新RL算法,以显示下一个攻击过程。