For orthogonal multiple access (OMA) systems, the number of served user equipments (UEs) is limited to the number of available orthogonal resources. On the other hand, non-orthogonal multiple access (NOMA) schemes allow multiple UEs to use the same orthogonal resource. This extra degree of freedom introduces new challenges for resource allocation. Buffer state information (BSI), like the size and age of packets waiting for transmission, can be used to improve scheduling in OMA systems. In this paper, we investigate the impact of BSI on the performance of a centralized scheduler in an uplink multi-carrier NOMA scenario with UEs having various data rate and latency requirements. To handle the large combinatorial space of allocating UEs to the resources, we propose a novel scheduler based on actor-critic reinforcement learning incorporating BSI. Training and evaluation are carried out using Nokia's "wireless suite". We propose various novel techniques to both stabilize and speed up training. The proposed scheduler outperforms benchmark schedulers.
翻译:对于正方形多重存取(OMA)系统,服务用户设备的数量限于可用正方形资源的数量。 另一方面,非正方形多存取(NOMA)方案允许多个电子资源使用相同的正方形资源。这种额外的自由度为资源分配带来了新的挑战。缓冲状态信息(BSI),如等待传输的包的大小和年龄,可用于改进OMA系统的时间安排。在本文中,我们调查BSI对中央调度器在多载式自动存取(NOMA)假设情景中与具有不同数据率和延迟要求的电源上链接的多载式自动存取(NOMA)的性能的影响。为了处理将Ues配置到资源的大型组合空间,我们提议基于包含BSI的动作-振动强化学习的新的排程。培训和评价使用Nokia的“无线套房”进行。我们建议采用各种新技术来稳定并加快培训。拟议的排制表比基准调度仪。