The Join-the-Shortest-Queue (JSQ) load balancing scheme is known to minimise the average response time of jobs in homogeneous systems with identical servers. However, for {\em heterogeneous} systems with servers having different processing speeds, finding an optimal load balancing scheme remains an open problem for finite system sizes. Recently, for systems with heterogeneous servers, a variant of the JSQ scheme, called the {\em Speed-Aware-Join-the-Shortest-Queue (SA-JSQ)} scheme, has been shown to achieve asymptotic optimality in the fluid-scaling regime where the number of servers $n$ tends to infinity but the normalised the arrival rate of jobs remains constant. {In this paper, we show that the SA-JSQ scheme is also asymptotically optimal for heterogeneous systems in the {\em Halfin-Whitt} traffic regime where the normalised arrival rate scales as $1-O(1/\sqrt{n})$.} Our analysis begins by establishing that an appropriately scaled and centered version of the Markov process describing system dynamics weakly converges to a two-dimensional reflected {\em Ornstein-Uhlenbeck (OU) process}. We then show using {\em Stein's method} that the stationary distribution of the underlying Markov process converges to that of the OU process as the system size increases by establishing the validity of interchange of limits. {Finally, through coupling with a suitably constructed system, we show that SA-JSQ asymptotically minimises the diffusion-scaled total number of jobs and the diffusion-scaled number of waiting jobs in the steady-state in the Halfin-Whitt regime among all policies which dispatch jobs based on queue lengths and server speeds.}
翻译:暂无翻译