结学:提高服务质量的强化学习方法 (Queue-Learning: A Reinforcement Learning Approach for Providing Quality of Service)

End-to-end delay is a critical attribute of quality of service (QoS) in application domains such as cloud computing and computer networks. This metric is particularly important in tandem service systems, where the end-to-end service is provided through a chain of services. Service-rate control is a common mechanism for providing QoS guarantees in service systems. In this paper, we introduce a reinforcement learning-based (RL-based) service-rate controller that provides probabilistic upper-bounds on the end-to-end delay of the system, while preventing the overuse of service resources. In order to have a general framework, we use queueing theory to model the service systems. However, we adopt an RL-based approach to avoid the limitations of queueing-theoretic methods. In particular, we use Deep Deterministic Policy Gradient (DDPG) to learn the service rates (action) as a function of the queue lengths (state) in tandem service systems. In contrast to existing RL-based methods that quantify their performance by the achieved overall reward, which could be hard to interpret or even misleading, our proposed controller provides explicit probabilistic guarantees on the end-to-end delay of the system. The evaluations are presented for a tandem queueing system with non-exponential inter-arrival and service times, the results of which validate our controller's capability in meeting QoS constraints.

翻译：终端到终端延迟是云计算和计算机网络等应用领域服务质量(QoS)的关键属性。这一指标在连带服务系统中特别重要,因为端到端服务是通过服务链提供。服务节率控制是服务系统中提供QOS保障的一个共同机制。在本文件中,我们引入了一个基于强化学习(基于RL)的服务率控制器,该控制器在系统端到端的延迟上提供不稳定性的上限,同时防止过度使用服务资源。为了有一个总体框架,我们使用排队理论来模拟服务系统。然而,我们采用基于RL的办法来避免排队到端服务系统的局限性。特别是,我们使用深度威慑性政策梯度(DPG)来学习服务率(动作),以此作为排队长(状态)在同步服务系统中的功能。与现有的基于RL的方法相比,用总体报酬来量化其业绩,这可能会难以解释,甚至误导服务系统模式。我们提议的递后递系统对递系统进行明确的预估性评估的能力。