We consider a large-scale service system where incoming tasks have to be instantaneously dispatched to one out of many parallel server pools. The user-perceived performance degrades with the number of concurrent tasks and the dispatcher aims at maximizing the overall quality-of-service by balancing the load through a simple threshold policy. We demonstrate that such a policy is optimal on the fluid and diffusion scales, while only involving a small communication overhead, which is crucial for large-scale deployments. In order to set the threshold optimally, it is important, however, to learn the load of the system, which may be unknown. For that purpose, we design a control rule for tuning the threshold in an online manner. We derive conditions which guarantee that this adaptive threshold settles at the optimal value, along with estimates for the time until this happens. In addition, we provide numerical experiments which support the theoretical results and further indicate that our policy copes effectively with time-varying demand patterns.
翻译:暂无翻译