In this paper, we explore various multi-agent reinforcement learning (MARL) techniques to design grant-free random access (RA) schemes for low-complexity, low-power battery operated devices in massive machine-type communication (mMTC) wireless networks. We use value decomposition networks (VDN) and QMIX algorithms with parameter sharing (PS) with centralized training and decentralized execution (CTDE) while maintaining scalability. We then compare the policies learned by VDN, QMIX, and deep recurrent Q-network (DRQN) and explore the impact of including the agent identifiers in the observation vector. We show that the MARL-based RA schemes can achieve a better throughput-fairness trade-off between agents without having to condition on the agent identifiers. We also present a novel correlated traffic model, which is more descriptive of mMTC scenarios, and show that the proposed algorithm can easily adapt to traffic non-stationarities
翻译:在本文中,我们探讨各种多试剂强化学习(MARL)技术,以设计低复杂度、低功率电池操作装置在大型机器型无线通信网络中的无赠与随机访问(RA)计划;我们使用价值分解网络(VDN)和QMIX算法,并使用集中培训和分散执行的参数共享(PS)和集中培训和分散执行(CTDE),同时保持可缩放性;然后比较VDN、QMIX和深层次经常性Q网络(DRQN)所学的政策,并探讨将代理识别器纳入观测矢量的影响。我们表明,基于MARL的RA算法可以在不以代理识别器为条件的情况下实现更好的代理物公平交易。我们还提出了一个新的相关交通模式,它更能描述MMTC情景,并表明拟议的算法可以很容易地适应交通非静止性。