In this paper, we study scheduling of a queueing system with zero knowledge of instantaneous network conditions. We consider a one-hop single-server queueing system consisting of $K$ queues, each with time-varying and non-stationary arrival and service rates. Our scheduling approach builds on an innovative combination of adversarial bandit learning and Lyapunov drift minimization, without knowledge of the instantaneous network state (the arrival and service rates) of each queue. We then present two novel algorithms \texttt{SoftMW} (SoftMaxWeight) and \texttt{SSMW} (Sliding-window SoftMaxWeight), both capable of stabilizing systems that can be stablized by some (possibly unknown) sequence of randomized policies whose time-variation satisfies a mild condition. We further generalize our results to the setting where arrivals and departures only have bounded moments instead of being deterministically bounded and propose \texttt{SoftMW+} and \texttt{SSMW+} that are capable of stabilizing the system. As a building block of our new algorithms, we also extend the classical \texttt{EXP3.S} (Auer et al., 2002) algorithm for multi-armed bandits to handle unboundedly large feedback signals, which can be of independent interest.
翻译:在本文中, 我们研究一个对瞬时网络条件知之甚少的排队系统的排队安排。 我们考虑一个单机单服务器排队系统, 由1K美元队列组成, 每个队列都有时间变化和非静止的到达和服务率。 我们的排队安排方法以对抗性强盗学习和Lyapunov漂移的创新性组合为基础, 没有了解每个队列的瞬时网络状态( 到达率和服务率) 。 然后我们提出两种新型算法 \ textt{SoftMW}( ftMaxweight) 和\ textt{SSMW} (滑机式软式软式) 和 & textt{SS- SSSMW} ( Sli- window SoftmaxWeight), 两者都有能力稳定系统, 能够被某些( 可能未知的) 随机化政策序列稳定下来, 时间变换条件较轻。 我们进一步将我们的结果概括到到达和离开的瞬间点, 而不是被决定性地捆绑绑起来, 并提议能够稳定系统的系统( ASmarial stricalmaqal3) 。</s>