Scheduling plays a pivotal role in multi-user wireless communications, since the quality of service of various users largely depends upon the allocated radio resources. In this paper, we propose a novel scheduling algorithm with contiguous frequency-domain resource allocation (FDRA) based on deep reinforcement learning (DRL) that jointly selects users and allocates resource blocks (RBs). The scheduling problem is modeled as a Markov decision process, and a DRL agent determines which user and how many consecutive RBs for that user should be scheduled at each RB allocation step. The state space, action space, and reward function are delicately designed to train the DRL network. More specifically, the originally quasi-continuous action space, which is inherent to contiguous FDRA, is refined into a finite and discrete action space to obtain a trade-off between the inference latency and system performance. Simulation results show that the proposed DRL-based scheduling algorithm outperforms other representative baseline schemes while having lower online computational complexity.
翻译:日程安排在多用户无线通信中发挥着关键作用,因为各用户的服务质量主要取决于所分配的无线电资源。在本文中,我们提出基于深度强化学习(DRL)的新型日程安排算法(FDRA),以联合选择用户并分配资源区块(RBs ) 。日程安排问题以Markov 决策程序为模范,而DRL代理商则决定每个RB分配步骤应安排该用户的哪个用户和多少连续的RB。州空间、行动空间和奖赏功能是用于培训DRL网络的微妙设计。更具体地说,最初的半连续行动空间是连成一体FDRA所固有的,正在被改进成一个有限和独立的行动空间,以在推断时间和系统性之间实现交易。模拟结果显示,拟议的基于DRL的日程安排算法比其他具有代表性的基线计划要好,而在线计算的复杂性则较低。