In multi-server queueing systems where there is no central queue holding all incoming jobs, job dispatching policies are used to assign incoming jobs to the queue at one of the servers. Classic job dispatching policies such as join-the-shortest-queue and shortest expected delay assume that the service rates and queue lengths of the servers are known to the dispatcher. In this work, we tackle the problem of job dispatching without the knowledge of service rates and queue lengths, where the dispatcher can only obtain noisy estimates of the service rates by observing job departures. This problem presents a novel exploration-exploitation trade-off between sending jobs to all the servers to estimate their service rates, and exploiting the currently known fastest servers to minimize the expected queueing delay. We propose a bandit-based exploration policy that learns the service rates from observed job departures. Unlike the standard multi-armed bandit problem where only one out of a finite set of actions is optimal, here the optimal policy requires identifying the optimal fraction of incoming jobs to be sent to each server. We present a regret analysis and simulations to demonstrate the effectiveness of the proposed bandit-based exploration policy.
翻译:在没有集中队列的多服务器队列系统中,没有集中队列持有所有到来的工作,因此采用工作派遣政策将新到的工作分配到一个服务器的队列中。典型的工作派遣政策,如加入最短的阵列和最短的预期延迟,假定调度员知道服务器的服务率和队列长度。在这项工作中,我们处理在没有服务率和队列长度知识的情况下派遣工作的问题,调度员只能通过观察工作离开而获得对服务率的响亮估计。这个问题是向所有服务器发送工作以估计其服务率和利用目前已知最快的最快的服务器以尽量减少预期的排队延迟之间的新探索-探索性交换。我们提议了一个以团列为基础的探索政策,从观察到的离职中学习服务率。与标准多臂带问题不同,因为只有一组有限的行动最理想,这里的最佳政策要求确定每个服务器接收到的工作的最佳比例。我们提出遗憾分析和模拟,以证明拟议以团列勘探政策的有效性。