To tackle the heterogeneous requirements of beyond 5G (B5G) and future 6G wireless networks, conventional medium access control (MAC) procedures need to evolve to enable base stations (BSs) and user equipments (UEs) to automatically learn innovative MAC protocols catering to extremely diverse services. This topic has received significant attention, and several reinforcement learning (RL) algorithms, in which BSs and UEs are cast as agents, are available with the aim of learning a communication policy based on agents' local observations. However, current approaches are typically overfitted to the environment they are trained in, and lack robustness against unseen conditions, failing to generalize in different environments. To overcome this problem, in this work, instead of learning a policy in the high dimensional and redundant observation space, we leverage the concept of observation abstraction (OA) rooted in extracting useful information from the environment. This in turn allows learning communication protocols that are more robust and with much better generalization capabilities than current baselines. To learn the abstracted information from observations, we propose an architecture based on autoencoder (AE) and imbue it into a multi-agent proximal policy optimization (MAPPO) framework. Simulation results corroborate the effectiveness of leveraging abstraction when learning protocols by generalizing across environments, in terms of number of UEs, number of data packets to transmit, and channel conditions.
翻译:为解决5G(B5G)和今后的6G无线网络的多种要求,常规的中继出入控制程序需要逐步演变,使基地站和用户设备能够自动学习适应极为多样化服务的新型MAC协议,这个专题受到极大关注,一些强化学习(RL)算法(BS和UE是作为代理商推出的),目的是学习基于代理商当地观察的通信政策;然而,目前的方法通常过于适合他们所培训的环境,缺乏抵御无法见的环境的稳健性,无法在不同的环境中推广。为了克服这一问题,我们在此工作中,而不是在高度和冗余的观测空间学习一项政策,我们利用观测抽象(OA)概念,植根于从环境中提取有用信息,而BSS和UE作为代理商,从而学习比当前基线更稳健和更具概括能力的通信协议。然而,为了了解从观测中获得的抽象信息,我们建议建立一个基于自动计算机化(AE)的架构,并且没有在不同的环境中推广这种结构,通过IMUA系统数据格式,将数据转换成一个模型化框架。