Motivated by applications such as machine repair, project monitoring, and anti-poaching patrol scheduling, we study intervention planning of stochastic processes under resource constraints. This planning problem has previously been modeled as restless multi-armed bandits (RMAB), where each arm is an intervention-dependent Markov Decision Process. However, the existing literature assumes all intervention resources belong to a single uniform pool, limiting their applicability to real-world settings where interventions are carried out by a set of workers, each with their own costs, budgets, and intervention effects. In this work, we consider a novel RMAB setting, called multi-worker restless bandits (MWRMAB) with heterogeneous workers. The goal is to plan an intervention schedule that maximizes the expected reward while satisfying budget constraints on each worker as well as fairness in terms of the load assigned to each worker. Our contributions are two-fold: (1) we provide a multi-worker extension of the Whittle index to tackle heterogeneous costs and per-worker budget and (2) we develop an index-based scheduling policy to achieve fairness. Further, we evaluate our method on various cost structures and show that our method significantly outperforms other baselines in terms of fairness without sacrificing much in reward accumulated.
翻译:根据机器修理、项目监测和反偷猎巡逻时间表等应用,我们研究在资源限制下对随机过程的干预规划,这一规划问题以前以无休止的多武装土匪(RMAB)为模范,每个手臂都是依赖干预的Markov决定程序,然而,现有文献假定所有干预资源都属于单一的统一资源库,将其适用范围限制在由一组工人(每个工人都有自己的成本、预算和干预效果)进行干预的实际情况中;在这项工作中,我们考虑一种新的RMAB设置,称为多工人无休息强盗(MWRMAB),由不同工人组成;目标是规划一个干预时间表,在满足每个工人的预算限制的同时,尽量增加预期的奖励,以及公平地分配给每个工人的工作量。我们的贡献有两重:(1) 我们提供惠特勒指数的多工作扩展,以解决不同成本和每个工人的预算,以及干预效果。(2)我们制定基于指数的时间安排政策,以实现公平。此外,我们评估各种成本结构的方法,并表明我们的方法在不牺牲其他基线方面大大牺牲了我们方法的公平性。</s>