Reinforcement learning (RL) for network slicing is considered in the 5G radio access network, where the base station, gNodeB, allocates resource blocks (RBs) to the requests of user equipments and maximizes the total reward of accepted requests over time. Based on adversarial machine learning, a novel over-the-air attack is introduced to manipulate the RL algorithm and disrupt 5G network slicing. Subject to an energy budget, the adversary observes the spectrum and builds its own RL-based surrogate model that selects which RBs to jam with the objective of maximizing the number of failed network slicing requests due to jammed RBs. By jamming the RBs, the adversary reduces the RL algorithm's reward. As this reward is used as the input to update the RL algorithm, the performance does not recover even after the adversary stops jamming. This attack is evaluated in terms of the recovery time and the (maximum and total) reward loss, and it is shown to be much more effective than benchmark (random and myopic) jamming attacks. Different reactive and proactive defense mechanisms (protecting the RL algorithm's updates or misleading the adversary's learning process) are introduced to show that it is viable to defend 5G network slicing against this attack.
翻译:在5G无线电接入网络中考虑网络切片强化学习(RL),基站GNodeB为用户设备的要求分配资源区块(RBs),并在一段时间内对已接受的请求给予最大程度的奖励。根据对抗性机器学习,引入新的超空攻击来操纵RL算法,干扰5G网络切片。根据能源预算,对手观察频谱,建立以RL为基础的代金模型,选择哪个RBs会干扰因卡住的RBs而导致的失败网络切片请求数量最大化的目标。通过干扰RBs,对手会减少RL算法的奖励。由于将这一奖励用作更新RL算法的投入,即使在敌人停止干扰后,业绩也不会恢复。根据恢复时间和(最大和总)奖励损失来评价这一攻击,并且显示它比基准(随机和 Myopic)干扰攻击次数多得多。不同的反应性和主动性防御机制正在向风险网络展示其可行的更新(保护Ralgs),以对抗这种升级性网络。