与多级服务机构一起进行无线日程安排的深入强化学习 (Deep Reinforcement Learning for Wireless Scheduling with Multiclass Services)

In this paper, we investigate the problem of scheduling and resource allocation over a time varying set of clients with heterogeneous demands.In this context, a service provider has to schedule traffic destined to users with different classes of requirements and to allocate bandwidth resources over time as a means to efficiently satisfy service demands within a limited time horizon. This is a highly intricate problem, in particular in wireless communication systems, and solutions may involve tools stemming from diverse fields, including combinatorics and constrained optimization. Although recent work has successfully proposed solutions based on Deep Reinforcement Learning (DRL), the challenging setting of heterogeneous user traffic and demands has not been addressed. We propose a deep deterministic policy gradient algorithm that combines state-of-the-art techniques, namely Distributional RL and Deep Sets, to train a model for heterogeneous traffic scheduling. We test on diverse scenarios with different time dependence dynamics, users' requirements, and resources available, demonstrating consistent results using both synthetic and real data. We evaluate the algorithm on a wireless communication setting using both synthetic and real data and show significant gains in terms of Quality of Service (QoS) defined by the classes, against state-of-the-art conventional algorithms from combinatorics, optimization and scheduling metric(e.g. Knapsack, Integer Linear Programming, Frank-Wolfe, Exponential Rule).

翻译：在本文中,我们调查了在需求各不相同、需求各异的一组不同客户之间时间安排和资源分配的问题。在这方面,服务供应商必须安排向不同需求类别用户发送的交通量,并随着时间的推移分配带宽资源,以此作为在有限时间范围内高效满足服务需求的手段。这是一个极为复杂的问题,特别是在无线通信系统中,解决办法可能涉及来自不同领域的工具,包括组合和限制优化。虽然最近的工作成功地提出了基于深度强化学习(DRL)的解决方案,但具有挑战性的用户流量和需求设置却未得到解决。我们建议采用一种深度的确定性政策梯度算法,将最新技术(即分销RL和深层设置)结合起来,以培训一种混合交通调度模式。我们测试不同时间依赖性动态、用户需求以及可用资源的不同情景,同时利用合成数据和真实数据来展示一致的结果。我们利用合成数据和真实数据评价无线通信设置的算法,并显示各班级确定的服务质量(QOS)方面的显著进步。我们建议采用州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-州-