This paper proposes a learning and scheduling algorithm to minimize the expected cumulative holding cost incurred by jobs, where statistical parameters defining their individual holding costs are unknown a priori. In each time slot, the server can process a job while receiving the realized random holding costs of the jobs remaining in the system. Our algorithm is a learning-based variant of the $c\mu$ rule for scheduling: it starts with a preemption period of fixed length which serves as a learning phase, and after accumulating enough data about individual jobs, it switches to nonpreemptive scheduling mode. The algorithm is designed to handle instances with large or small gaps in jobs' parameters and achieves near-optimal performance guarantees. The performance of our algorithm is captured by its regret, where the benchmark is the minimum possible cost attained when the statistical parameters of jobs are fully known. We prove upper bounds on the regret of our algorithm, and we derive a regret lower bound that is almost matching the proposed upper bounds. Our numerical results demonstrate the effectiveness of our algorithm and show that our theoretical regret analysis is nearly tight.
翻译:本文建议了一种学习和时间安排算法,以尽量减少工作产生的预期累积持有成本,在这种算法中,确定个人持有成本的统计参数是事先不为人知的。在每一个时间段中,服务器可以处理一个工作,同时获得系统剩余工作的实际随机持有成本。我们的算法是一种基于学习的变体,是用于时间安排的$c\mu美元规则的变体:它从一个固定长度的先发制人时期开始,作为学习阶段,在积累了足够的个别工作数据之后,它转而采用非先发制人的安排模式。这个算法旨在处理工作参数存在巨大或小差距的情况,并实现近于最佳的绩效保障。我们的算法表现是遗憾的,因为基准是完全了解工作统计参数时可能达到的最低成本。我们证明了我们的算法的遗憾程度,我们得到的遗憾程度更低的限度几乎与提议的上限一致。我们的数字结果表明我们的算法的有效性,并表明我们的理论遗憾分析几乎是紧凑的。