We propose a mechanism for distributed resource management and interference mitigation in wireless networks using multi-agent deep reinforcement learning (RL). We equip each transmitter in the network with a deep RL agent that receives delayed observations from its associated users, while also exchanging observations with its neighboring agents, and decides on which user to serve and what transmit power to use at each scheduling interval. Our proposed framework enables agents to make decisions simultaneously and in a distributed manner, unaware of the concurrent decisions of other agents. Moreover, our design of the agents' observation and action spaces is scalable, in the sense that an agent trained on a scenario with a specific number of transmitters and users can be applied to scenarios with different numbers of transmitters and/or users. Simulation results demonstrate the superiority of our proposed approach compared to decentralized baselines in terms of the tradeoff between average and $5^{th}$ percentile user rates, while achieving performance close to, and even in certain cases outperforming, that of a centralized information-theoretic baseline. We also show that our trained agents are robust and maintain their performance gains when experiencing mismatches between train and test deployments.
翻译:我们提出利用多试剂深度强化学习(RL)在无线网络中分配资源管理和减少干扰的机制。我们为网络中每个发报机配备一个深度RL代理,接收其相关用户的延迟观察,同时与周边用户交换观察,并决定每个列表间隔期间应服务哪些用户和应使用哪些传输权力。我们提议的框架使各发报机能够同时和以分布方式作出决定,而不知道其他代理商的并行决定。此外,我们设计代理人的观察和行动空间是可扩缩的,也就是说,在特定情况下受过特定发报机和用户培训的代理人可以适用于不同数目的发报机和(或)用户。模拟结果表明,就平均和5 ⁇ th}5美元之间的折价而言,我们所提议的方法优于分散基线,同时实现接近、甚至在某些情况下超过集中信息理论基线的业绩。我们还表明,我们受过训练的代理人在遇到火车和测试部署不匹配时,其业绩成绩是稳健的,并保持。