In this paper, we develop algorithms for joint user scheduling and three types of mmWave link configuration: relay selection, codebook optimization, and beam tracking in millimeter wave (mmWave) networks. Our goal is to design an online controller that dynamically schedules users and configures their links to minimize the system delay. To solve this complex scheduling problem, we model it as a dynamic decision-making process and develop two reinforcement learning-based solutions. The first solution is based on deep reinforcement learning (DRL), which leverages the proximal policy optimization to train a neural network-based solution. Due to the potential high sample complexity of DRL, we also propose an empirical multi-armed bandit (MAB)-based solution, which decomposes the decision-making process into a sequential of sub-actions and exploits classic maxweight scheduling and Thompson sampling to decide those sub-actions. Our evaluation of the proposed solutions confirms their effectiveness in providing acceptable system delay. It also shows that the DRL-based solution has better delay performance while the MAB-based solution has a faster training process.
翻译:在本文中,我们为联合用户时间安排和三种类型的毫米Wave链接配置开发了算法:中继选择、代码手册优化和对毫米波(mmWave)网络进行束形跟踪。我们的目标是设计一个在线控制器,动态地安排用户并配置其链接,以尽量减少系统延迟。为了解决这一复杂的时间安排问题,我们将其建模为动态决策程序,并开发两个强化学习解决方案。第一个解决方案基于深度强化学习(DRL),利用准政策优化来培训神经网络解决方案。由于DRL具有潜在的高样本复杂性,我们还提出了一个经验性多臂土匪(MAB)解决方案,该解决方案将决策过程分解成一个子动作的顺序,并利用典型的峰值列表和汤普森抽样来决定这些子动作。我们对拟议解决方案的评估证实了其在提供可接受的系统延迟方面的有效性。它还表明,基于DRL的解决方案在以MAB为基础的解决方案有更快的培训过程的情况下,可以更好地延迟运行。