Restless Multi-Armed Bandits (RMABs) have been popularly used to model limited resource allocation problems. Recently, these have been employed for health monitoring and intervention planning problems. However, the existing approaches fail to account for the arrival of new patients and the departure of enrolled patients from a treatment program. To address this challenge, we formulate a streaming bandit (S-RMAB) framework, a generalization of RMABs where heterogeneous arms arrive and leave under possibly random streams. We propose a new and scalable approach to computing index-based solutions. We start by proving that index values decrease for short residual lifetimes, a phenomenon that we call index decay. We then provide algorithms designed to capture index decay without having to solve the costly finite horizon problem, thereby lowering the computational complexity compared to existing methods.We evaluate our approach via simulations run on real-world data obtained from a tuberculosis intervention planning task as well as multiple other synthetic domains. Our algorithms achieve an over 150x speed-up over existing methods in these tasks without loss in performance. These findings are robust across multiple domains.
翻译:多装甲猛匪(RMABs)被广泛用来模拟有限的资源分配问题。最近,这些被用于健康监测和干预规划问题。然而,现有办法没有考虑到新病人的到来和注册病人离开治疗方案的情况。为了应对这一挑战,我们制定了一条流式强盗(S-RMAB)框架,在混杂武器到达并离开可能随机流的情况下,对RMABs进行概括化。我们提出了一种新的和可扩展的方法来计算基于指数的解决办法。我们首先证明指数值在短的剩余寿命期间会下降,这是一种我们称之为指数衰变的现象。我们随后提供了旨在捕捉指数衰败的算法,而不必解决昂贵的有限地平线问题,从而降低计算的复杂性。我们通过模拟从结核病干预规划任务和其他多种合成领域获得的实际世界数据来评估我们的方法。我们的算法在不造成绩效损失的情况下,在这些任务的现有方法上取得了150x速度的加速率。这些发现在多个领域是稳健的。