We study the problem of real-time scheduling in a multi-hop millimeter-wave (mmWave) mesh. We develop a model-free deep reinforcement learning algorithm called Adaptive Activator RL (AARL), which determines the subset of mmWave links that should be activated during each time slot and the power level for each link. The most important property of AARL is its ability to make scheduling decisions within the strict time slot constraints of typical 5G mmWave networks. AARL can handle a variety of network topologies, network loads, and interference models, it can also adapt to different workloads. We demonstrate the operation of AARL on several topologies: a small topology with 10 links, a moderately-sized mesh with 48 links, and a large topology with 96 links. We show that for each topology, we compare the throughput obtained by AARL to that of a benchmark algorithm called RPMA (Residual Profit Maximizer Algorithm). The most important advantage of AARL compared to RPMA is that it is much faster and can make the necessary scheduling decisions very rapidly during every time slot, while RPMA cannot. In addition, the quality of the scheduling decisions made by AARL outperforms those made by RPMA.
翻译:我们研究的是在多速波(mmWave)网格中实时排期的问题。我们开发了一个名为适应活性机RL(AARL)的无模型的深强化学习算法,它确定每个时段应激活的毫米Wave连接子子集,每个时段和每个连接的功率水平。AARL最重要的特性是它能够在典型的5GmmWave网络的严格时段限制下做出排期决定。AARL可以处理各种网络地形、网络负荷和干扰模型,也可以适应不同的工作量。我们展示了AARL在几个顶点上的操作:一个有10个链接的小型表层,一个有48个链接的中度网格,以及一个有96个链接的大型表层。我们显示,对于每个表层,我们比较AARLL获得的占比值与称为RPMA(Residual Pariter Algorithm)的基准算值。与RPMA(RPMA)相比,最重要的优势是,它能够快速地做出必要的排期决定。