In this paper we study a multi-class, multi-server queueing system with stochastic rewards of job-server assignments following a bilinear model in feature vectors representing jobs and servers. Our goal is regret minimization against an oracle policy that has a complete information about system parameters. We propose a scheduling algorithm that uses a linear bandit algorithm along with dynamic allocation of jobs to servers. For the baseline setting, in which mean job service times are identical for all jobs, we show that our algorithm has a sub-linear regret, as well as a sub-linear bound on the mean queue length, in the horizon time. We further show that similar bounds hold under more general assumptions, allowing for non-identical mean job service times for different job classes and a time-varying set of server classes. We also show that better regret and mean queue length bounds can be guaranteed by an algorithm having access to traffic intensities of job classes. We present results of numerical experiments demonstrating how regret and mean queue length of our algorithms depend on various system parameters and compare their performance against a previously proposed algorithm using synthetic randomly generated data and a real-world cluster computing data trace.
翻译:在本文中,我们研究一个多级、多服务器的排队系统,在代表职位和服务器的特性矢量中,采用双线模型,对工作服务器任务进行分流式奖励。我们的目标是,对拥有系统参数完整信息的甲骨文政策,对最小化表示遗憾。我们建议采用一个使用线性土匪算法和动态向服务器分配工作的排队算法。在基线设置中,所有工作的平均工作服务时间相同。我们显示,我们的算法有一个子线性遗憾,以及一个分线性线性对平均排队长度的分线性约束。我们进一步显示,类似的界限在更一般的假设下存在,允许不同工作类和一系列服务器级的不相同的平均工作服务时间段。我们还表明,使用合成随机生成的数据和真实世界集集数据跟踪的算法可以保证更好的遗憾和平均排队长度。我们介绍了数字实验的结果,表明我们算法的遗憾和中的平均排队排队长度如何取决于不同的系统参数,并将它们的业绩与先前提议的算法进行比较,同时使用合成随机生成的数据和真实的分组数据跟踪。