Restless multi-armed bandits are often used to model budget-constrained resource allocation tasks where receipt of the resource is associated with an increased probability of a favorable state transition. Prior work assumes that individual arms only benefit if they receive the resource directly. However, many allocation tasks occur within communities and can be characterized by positive externalities that allow arms to derive partial benefit when their neighbor(s) receive the resource. We thus introduce networked restless bandits, a novel multi-armed bandit setting in which arms are both restless and embedded within a directed graph. We then present Greta, a graph-aware, Whittle index-based heuristic algorithm that can be used to efficiently construct a constrained reward-maximizing action vector at each timestep. Our empirical results demonstrate that Greta outperforms comparison policies across a range of hyperparameter values and graph topologies.
翻译:无休止的多武装匪徒往往被用来模拟预算紧张的资源分配任务,在这种任务中,获得资源与有利的国家过渡的可能性增加有关。先前的工作假设,个人武器只有在直接获得资源的情况下才会受益。然而,许多分配任务发生在社区内部,其特点是积极的外差因素,使武器在邻居收到资源时能够部分受益。我们因此引进了网络化的无休止的匪徒,这是一种新型的多武装匪徒环境,其中武器既无休止,又嵌入一个定向图表。我们然后介绍Greta,一种图形敏锐的、基于惠特尔指数的超值算法,可以用来在每一个阶段高效地构建一个有限制的最大化行动矢量。我们的经验结果表明,Greta超越了对一系列超参数值和图示表的比较政策。