Communication protocols are the languages used by network nodes. Before a user equipment (UE) can exchange data with a base station (BS), it must first negotiate the conditions and parameters for that transmission. This negotiation is supported by signaling messages at all layers of the protocol stack. Each year, the mobile communications industry defines and standardizes these messages, which are designed by humans during lengthy technical (and often political) debates. Following this standardization effort, the development phase begins, wherein the industry interprets and implements the resulting standards. But is this massive development undertaking the only way to implement a given protocol? We address the question of whether radios can learn a pre-given target protocol as an intermediate step towards evolving their own. Furthermore, we train cellular radios to emerge a channel access policy that performs optimally under the constraints of the target protocol. We show that multi-agent reinforcement learning (MARL) and learning-to-communicate (L2C) techniques achieve this goal with gains over expert systems. Finally, we provide insight into the transferability of these results to scenarios never seen during training.
翻译:通信协议是网络节点所使用的语言。 在用户设备(UE)能够与基地站(BS)交换数据之前,它必须首先就传输的条件和参数进行谈判。这种谈判得到协议书各层信号信息的支持。每年,移动通信行业界定和规范这些信息,这些信息由人类在冗长的技术(而且往往是政治性)辩论中设计。在这一标准化努力之后,开发阶段开始,该行业解释和执行由此产生的标准。但这一大规模发展是否意味着执行特定协议的唯一方式?我们探讨的是,无线电能否学习预先设定的目标协议,作为发展自己的中间步骤。此外,我们培训蜂窝收音机,以形成在目标协议的限制下最优化地发挥作用的频道访问政策。我们表明,多剂强化学习(MARL)和学习-communate(L2C)技术在专家系统上取得了收益,从而实现这一目标。最后,我们深入了解这些结果对培训期间从未看到的情况的可转移性。