无线停战协议与多机构强化学习协议的出现 (The Emergence of Wireless MAC Protocols with Multi-Agent Reinforcement Learning)

In this paper, we propose a new framework, exploiting the multi-agent deep deterministic policy gradient (MADDPG) algorithm, to enable a base station (BS) and user equipment (UE) to come up with a medium access control (MAC) protocol in a multiple access scenario. In this framework, the BS and UEs are reinforcement learning (RL) agents that need to learn to cooperate in order to deliver data. The network nodes can exchange control messages to collaborate and deliver data across the network, but without any prior agreement on the meaning of the control messages. In such a framework, the agents have to learn not only the channel access policy, but also the signaling policy. The collaboration between agents is shown to be important, by comparing the proposed algorithm to ablated versions where either the communication between agents or the central critic is removed. The comparison with a contention-free baseline shows that our framework achieves a superior performance in terms of goodput and can effectively be used to learn a new protocol.

翻译：在本文中,我们提出了一个新的框架,利用多试剂的深度确定政策梯度(MADDPG)算法,使基地站和用户设备(UE)能够在多重访问情况下提出中型出入控制(MAC)协议。在这个框架内,BS和UE是需要学习合作才能提供数据的强化学习(RL)代理。网络节点可以交换控制信息,以便在整个网络中合作和提供数据,但无需事先就控制信息的含义达成任何协议。在这样一个框架内,代理人不仅要学习频道访问政策,还要学习信号政策。通过将拟议的算法与取消代理商或中央评论家之间的通信的布局化版本进行比较,表明代理人之间的合作很重要。与无争议基线的比较表明,我们的框架在良好作用方面表现优异,可以有效地用于学习新的协议。