以搜索和关注方式进行学习强力调度 (Learning Robust Scheduling with Search and Attention)

Allocating physical layer resources to users based on channel quality, buffer size, requirements and constraints represents one of the central optimization problems in the management of radio resources. The solution space grows combinatorially with the cardinality of each dimension making it hard to find optimal solutions using an exhaustive search or even classical optimization algorithms given the stringent time requirements. This problem is even more pronounced in MU-MIMO scheduling where the scheduler can assign multiple users to the same time-frequency physical resources. Traditional approaches thus resort to designing heuristics that trade optimality in favor of feasibility of execution. In this work we treat the MU-MIMO scheduling problem as a tree-structured combinatorial problem and, borrowing from the recent successes of AlphaGo Zero, we investigate the feasibility of searching for the best performing solutions using a combination of Monte Carlo Tree Search and Reinforcement Learning. To cater to the nature of the problem at hand, like the lack of an intrinsic ordering of the users as well as the importance of dependencies between combinations of users, we make fundamental modifications to the neural network architecture by introducing the self-attention mechanism. We then demonstrate that the resulting approach is not only feasible but vastly outperforms state-of-the-art heuristic-based scheduling approaches in the presence of measurement uncertainties and finite buffers.

翻译：以频道质量、缓冲大小、要求和限制为基础向用户分配物理层资源,是无线电资源管理中一个核心优化问题。解决方案空间随着每个方面的主要特点而增长,使得很难利用详尽的搜索甚至典型的优化算法找到最佳解决办法,因为考虑到严格的时间要求,这个问题在MU-MIMO的日程安排中更为突出,因为调度员可以将多个用户分配到同一时间-频率的实物资源中。传统方法因此采用设计超常方法,使贸易优化有利于执行的可行性。在这项工作中,我们把MU-MIMO的日程安排问题当作树木结构组合问题处理,并借用AlphaGo Zero最近的成功经验,我们调查利用蒙特卡洛树搜索与强化学习相结合的方法寻找最佳执行解决方案的可行性。为了适应手头问题的性质,例如用户缺乏内在的秩序,以及用户之间相互依存的重要性,我们通过引入自留机制对神经网络结构进行根本性的修改。我们随后从AlphaGoGo Zerro中借用的缓冲方法,我们然后研究如何寻找最佳办法。我们由此得出的缓冲测量方法并不可行,但只是具有一定的不确定性。